Search Results for author: Jeff M. Phillips

Found 28 papers, 4 papers with code

No Dimensional Sampling Coresets for Classification

no code implementations7 Feb 2024 Meysam Alishahi, Jeff M. Phillips

We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework.

Classification

On Mergable Coresets for Polytope Distance

no code implementations8 Nov 2023 Benwei Shi, Aditya Bhaskara, Wai Ming Tai, Jeff M. Phillips

We show that a constant-size constant-error coreset for polytope distance is simple to maintain under merges of coresets.

An Efficient Content-based Time Series Retrieval System

no code implementations5 Oct 2023 Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Junpeng Wang, Vivian Lai, Yujie Fan, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei zhang, Jeff M. Phillips

A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing.

Information Retrieval Retrieval +1

For Kernel Range Spaces a Constant Number of Queries Are Sufficient

no code implementations28 Jun 2023 Jeff M. Phillips, Hasan Pourmahmood-Aghababa

For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordinate $(R_p)_i = K(p, x_i)$ for $x_i \in X$.

Linear Distance Metric Learning with Noisy Labels

no code implementations5 Jun 2023 Meysam Alishahi, Anna Little, Jeff M. Phillips

In linear distance metric learning, we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible.

Learning with noisy labels Metric Learning

Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach

no code implementations26 May 2023 Tao Yang, Cuize Han, Chen Luo, Parth Gupta, Jeff M. Phillips, Qingyao Ai

While previous studies have demonstrated the effectiveness of using user behavior signals (e. g., clicks) as both features and labels of LTR algorithms, we argue that existing LTR algorithms that indiscriminately treat behavior and non-behavior signals in input features could lead to suboptimal performance in practice.

Learning-To-Rank Recommendation Systems

Batch Multi-Fidelity Active Learning with Budget Constraints

no code implementations23 Oct 2022 Shibo Li, Jeff M. Phillips, Xin Yu, Robert M. Kirby, Shandian Zhe

However, this method only queries at one pair of fidelity and input at a time, and hence has a risk to bring in strongly correlated examples to reduce the learning efficiency.

Active Learning

Classifying Spatial Trajectories

1 code implementation3 Sep 2022 Hasan Pourmahmood-Aghababa, Jeff M. Phillips

We provide the first comprehensive study on how to classify trajectories using only their spatial representations, measured on 5 real-world data sets.

Self-Adaptable Point Processes with Nonparametric Time Decays

no code implementations NeurIPS 2021 Zhimeng Pan, Zheng Wang, Jeff M. Phillips, Shandian Zhe

Specifically, we use an embedding to represent each event type and model the event influence as an unknown function of the embeddings and time span.

Point Processes

Practical and Configurable Network Traffic Classification Using Probabilistic Machine Learning

no code implementations10 Jul 2021 Jiahui Chen, Joe Breen, Jeff M. Phillips, Jacobus Van der Merwe

Network traffic classification that is widely applicable and highly accurate is valuable for many network security and management tasks.

BIG-bench Machine Learning Classification +2

Approximate Maximum Halfspace Discrepancy

no code implementations25 Jun 2021 Michael Matheny, Jeff M. Phillips

For different classes of $\Phi$ we can either provide a $\Omega(|X|^{3/2 - o(1)})$ time lower bound for the exact solution with a reduction to APSP, or an $\Omega(|X| + 1/\varepsilon^{2-o(1)})$ lower bound for the approximate solution with a reduction to 3SUM.

Anomaly Detection

VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

1 code implementation6 Apr 2021 Archit Rathore, Sunipa Dev, Jeff M. Phillips, Vivek Srikumar, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei zhang, Bei Wang

To aid this, we present Visualization of Embedding Representations for deBiasing system ("VERB"), an open-source web-based visualization tool that helps the users gain a technical understanding and visual intuition of the inner workings of debiasing techniques, with a focus on their geometric properties.

Decision Making Dimensionality Reduction +3

OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

1 code implementation EMNLP 2021 Sunipa Dev, Tao Li, Jeff M. Phillips, Vivek Srikumar

Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks.

Word Embeddings

A Deterministic Streaming Sketch for Ridge Regression

1 code implementation5 Feb 2020 Benwei Shi, Jeff M. Phillips

We provide a deterministic space-efficient algorithm for estimating ridge regression.

regression

Constrained Non-Affine Alignment of Embeddings

no code implementations13 Oct 2019 Yuwei Wang, Yan Zheng, Yanqing Peng, Chin-Chia Michael Yeh, Zhongfang Zhuang, Das Mahashweta, Bendre Mangesh, Feifei Li, Wei zhang, Jeff M. Phillips

Embeddings are already essential tools for large language models and image analysis, and their use is being extended to many other research domains.

The Kernel Spatial Scan Statistic

no code implementations13 Jun 2019 Mingxuan Han, Michael Matheny, Jeff M. Phillips

Kulldorff's (1997) seminal paper on spatial scan statistics (SSS) has led to many methods considering different regions of interest, different statistical models, and different approximations while also having numerous applications in epidemiology, environmental monitoring, and homeland security.

Epidemiology

The GaussianSketch for Almost Relative Error Kernel Distance

no code implementations9 Nov 2018 Jeff M. Phillips, Wai Ming Tai

We introduce two versions of a new sketch for approximately embedding the Gaussian kernel into Euclidean inner product space.

Closed Form Word Embedding Alignment

no code implementations4 Jun 2018 Sunipa Dev, Safia Hassan, Jeff M. Phillips

We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e. g., GloVe or word2vec).

Word Embeddings

Simple Distances for Trajectories via Landmarks

no code implementations30 Apr 2018 Jeff M. Phillips, Pingfan Tang

We develop a new class of distances for objects including lines, hyperplanes, and trajectories, based on the distance to a set of landmarks.

Clustering

Near-Optimal Coresets of Kernel Density Estimates

no code implementations6 Feb 2018 Jeff M. Phillips, Wai Ming Tai

When $d\geq 1/\varepsilon^2$, it is known that the size of coreset can be $O(1/\varepsilon^2)$.

Improved Coresets for Kernel Density Estimates

no code implementations11 Oct 2017 Jeff M. Phillips, Wai Ming Tai

When the dimension $d$ is constant, we demonstrate much tighter bounds on the size of the coreset specifically for Gaussian kernels, showing that it is bounded by the size of the coreset for axis-aligned rectangles.

Coresets for Kernel Regression

no code implementations13 Feb 2017 Yan Zheng, Jeff M. Phillips

Kernel regression is an essential and ubiquitous tool for non-parametric data analysis, particularly popular among time series and spatial data.

regression Time Series +1

Relative Error Embeddings for the Gaussian Kernel Distance

no code implementations17 Feb 2016 Di Chen, Jeff M. Phillips

A reproducing kernel can define an embedding of a data point into an infinite dimensional reproducing kernel Hilbert space (RKHS).

Streaming Kernel Principal Component Analysis

no code implementations16 Dec 2015 Mina Ghashami, Daniel Perry, Jeff M. Phillips

Kernel principal component analysis (KPCA) provides a concise set of basis vectors which capture non-linear structures within large data sets, and is a central tool in data analysis and learning.

Subsampling in Smoothed Range Spaces

no code implementations30 Oct 2015 Jeff M. Phillips, Yan Zheng

We consider smoothed versions of geometric range spaces, so an element of the ground set (e. g. a point) can be contained in a range with a non-binary value in $[0, 1]$.

Frequent Directions : Simple and Deterministic Matrix Sketching

no code implementations8 Jan 2015 Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff

It performed $O(d \times \ell)$ operations per row and maintains a sketch matrix $B \in R^{\ell \times d}$ such that for any $k < \ell$ $\|A^TA - B^TB \|_2 \leq \|A - A_k\|_F^2 / (\ell-k)$ and $\|A - \pi_{B_k}(A)\|_F^2 \leq \big(1 + \frac{k}{\ell-k}\big) \|A-A_k\|_F^2 $ .

Data Structures and Algorithms 68W40 (Primary)

Cannot find the paper you are looking for? You can Submit a new open access paper.