Search Results for author: Sewoong Oh

Found 104 papers, 42 papers with code

Randomization Techniques to Mitigate the Risk of Copyright Infringement

no code implementations21 Aug 2024 Wei-Ning Chen, Peter Kairouz, Sewoong Oh, Zheng Xu

In this paper, we investigate potential randomization approaches that can complement current practices of input-based methods (such as licensing data and prompt filtering) and output-based methods (such as recitation checker, license checker, and model-based similarity score) for copyright protection.

Better Alignment with Instruction Back-and-Forth Translation

no code implementations8 Aug 2024 Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston, Luke Zettlemoyer, Xian Li

We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs).

Diversity Translation +1

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

1 code implementation23 Jul 2024 Jonathan Hayase, Alisa Liu, Yejin Choi, Sewoong Oh, Noah A. Smith

Our key insight is that the ordered list of merge rules learned by a BPE tokenizer naturally reveals information about the token frequencies in its training data.

Understanding the Gains from Repeated Self-Distillation

no code implementations5 Jul 2024 Divyansh Pareek, Simon S. Du, Sewoong Oh

Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model.

Knowledge Distillation regression

PLeaS -- Merging Models with Permutations and Least Squares

no code implementations2 Jul 2024 Anshul Nasery, Jonathan Hayase, Pang Wei Koh, Sewoong Oh

Furthermore, the final merged model is typically restricted to be of the same size as the original models.

Multilingual Diversity Improves Vision-Language Representations

no code implementations27 May 2024 Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna

By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set.

Diversity Text Retrieval

AirGapAgent: Protecting Privacy-Conscious Conversational Agents

no code implementations8 May 2024 Eugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel Ramage

The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns.

Language Modelling Large Language Model

Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy

no code implementations2 May 2024 Wei-Ning Chen, Berivan Isik, Peter Kairouz, Albert No, Sewoong Oh, Zheng Xu

We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e. g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers.

Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

no code implementations23 Apr 2024 Gavin Brown, Jonathan Hayase, Samuel Hopkins, Weihao Kong, Xiyang Liu, Sewoong Oh, Juan C. Perdomo, Adam Smith

We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X^\top X$, where $X$ is the design matrix.

On the Convergence of Differentially-Private Fine-tuning: To Linearly Probe or to Fully Fine-tune?

no code implementations29 Feb 2024 Shuqi Ke, Charlie Hou, Giulia Fanti, Sewoong Oh

We provide theoretical insights into the convergence of DP fine-tuning within an overparameterized neural network and establish a utility curve that determines the allocation of privacy budget between linear probing and full fine-tuning.

Privacy-Preserving Instructions for Aligning Large Language Models

1 code implementation21 Feb 2024 Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu

Service providers of large language model (LLM) applications collect user instructions in the wild and use them in further aligning LLMs with users' intentions.

Language Modelling Large Language Model +1

DPZero: Private Fine-Tuning of Language Models without Backpropagation

1 code implementation14 Oct 2023 Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He

The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy.

Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning

no code implementations6 Oct 2023 Liam Collins, Shanshan Wu, Sewoong Oh, Khe Chai Sim

In many applications of federated learning (FL), clients desire models that are personalized using their local data, yet are also robust in the sense that they retain general global knowledge.

Benchmarking Federated Learning +3

Private Federated Learning with Autotuned Compression

1 code implementation20 Jul 2023 Enayat Ullah, Christopher A. Choquette-Choo, Peter Kairouz, Sewoong Oh

We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates.

Federated Learning

Why Is Public Pretraining Necessary for Private Model Training?

no code implementations19 Feb 2023 Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, Lun Wang

To explain this phenomenon, we hypothesize that the non-convex loss landscape of a model training necessitates an optimization algorithm to go through two phases.

Transfer Learning

One-shot Empirical Privacy Estimation for Federated Learning

1 code implementation6 Feb 2023 Galen Andrew, Peter Kairouz, Sewoong Oh, Alina Oprea, H. Brendan McMahan, Vinith M. Suriyakumar

Privacy estimation techniques for differentially private (DP) algorithms are useful for comparing against analytical bounds, or to empirically measure privacy loss in settings where known analytical bounds are not tight.

Federated Learning

Near Optimal Private and Robust Linear Regression

no code implementations30 Jan 2023 Xiyang Liu, Prateek Jain, Weihao Kong, Sewoong Oh, Arun Sai Suggala

Under label-corruption, this is the first efficient linear regression algorithm to guarantee both $(\varepsilon,\delta)$-DP and robustness.

regression

Machine Learning-Aided Efficient Decoding of Reed-Muller Subcodes

no code implementations16 Jan 2023 Mohammad Vahid Jamali, Xiyang Liu, Ashok Vardhan Makkuva, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath

Next, we derive the soft-decision based version of our algorithm, called soft-subRPA, that not only improves upon the performance of subRPA but also enables a differentiable decoding algorithm.

Sequence-to-sequence translation from mass spectra to peptides with a transformer model

2 code implementations bioRxiv 2023 Melih Yilmaz, William E. Fondrie, Wout Bittremieux, Rowan Nelson, Varun Ananth, Sewoong Oh, William Stafford Noble

A fundamental challenge for any mass spectrometry-based proteomics experiment is the identification of the peptide that generated each acquired tandem mass spectrum.

de novo peptide sequencing

MAUVE Scores for Generative Models: Theory and Practice

1 code implementation30 Dec 2022 Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui

We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images.

Quantization

Learning to Generate Image Embeddings with User-level Differential Privacy

1 code implementation CVPR 2023 Zheng Xu, Maxwell Collins, Yuxiao Wang, Liviu Panait, Sewoong Oh, Sean Augenstein, Ting Liu, Florian Schroff, H. Brendan McMahan

Small on-device models have been successfully trained with user-level differential privacy (DP) for next word prediction and image classification tasks in the past.

Federated Learning Image Classification

Zonotope Domains for Lagrangian Neural Network Verification

1 code implementation14 Oct 2022 Matt Jordan, Jonathan Hayase, Alexandros G. Dimakis, Sewoong Oh

Neural network verification aims to provide provable bounds for the output of a neural network for a given input range.

Few-shot Backdoor Attacks via Neural Tangent Kernels

1 code implementation12 Oct 2022 Jonathan Hayase, Sewoong Oh

In a backdoor attack, an attacker injects corrupted examples into the training set.

Backdoor Attack Bilevel Optimization

Stochastic optimization on matrices and a graphon McKean-Vlasov limit

no code implementations2 Oct 2022 Zaid Harchaoui, Sewoong Oh, Soumik Pal, Raghav Somani, Raghavendra Tripathi

We consider stochastic gradient descents on the space of large symmetric matrices of suitable functions that are invariant under permuting the rows and columns using the same permutation.

Stochastic Optimization

CRISP: Curriculum based Sequential Neural Decoders for Polar Code Family

1 code implementation1 Oct 2022 S Ashwin Hebbar, Viraj Nadkarni, Ashok Vardhan Makkuva, Suma Bhat, Sewoong Oh, Pramod Viswanath

We design a principled curriculum, guided by information-theoretic insights, to train CRISP and show that it outperforms the successive-cancellation (SC) decoder and attains near-optimal reliability performance on the Polar(32, 16) and Polar(64, 22) codes.

Decoder

Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

1 code implementation10 Aug 2022 Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt

Web-crawled datasets have enabled remarkable generalization capabilities in recent image-text models such as CLIP (Contrastive Language-Image pre-training) or Flamingo, but little is known about the dataset creation processes.

Bring Your Own Algorithm for Optimal Differentially Private Stochastic Minimax Optimization

no code implementations1 Jun 2022 Liang Zhang, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He

We provide a general framework for solving differentially private stochastic minimax optimization (DP-SMO) problems, which enables the practitioners to bring their own base optimization algorithm and use it as a black-box to obtain the near-optimal privacy-loss trade-off.

DP-PCA: Statistically Optimal and Differentially Private PCA

no code implementations27 May 2022 Xiyang Liu, Weihao Kong, Prateek Jain, Sewoong Oh

For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=\tilde O(d)$.

Towards a Defense Against Federated Backdoor Attacks Under Continuous Training

1 code implementation24 May 2022 Shuaiqi Wang, Jonathan Hayase, Giulia Fanti, Sewoong Oh

We propose shadow learning, a framework for defending against backdoor attacks in the FL setting under long-range training.

Continual Learning Federated Learning

MAML and ANIL Provably Learn Representations

no code implementations7 Feb 2022 Liam Collins, Aryan Mokhtari, Sewoong Oh, Sanjay Shakkottai

Recent empirical evidence has driven conventional wisdom to believe that gradient-based meta-learning (GBML) methods perform well at few-shot learning because they learn an expressive data representation that is shared across tasks.

Diversity Few-Shot Learning +1

Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax Optimization

no code implementations19 Jan 2022 Kiran Koshy Thekumparampil, Niao He, Sewoong Oh

We also provide a direct single-loop algorithm, using the LPD method, that achieves the iteration complexity of $O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{\mu_y \varepsilon}} + \sqrt{\frac{L_y}{\varepsilon}})$.

Statistically and Computationally Efficient Linear Meta-representation Learning

no code implementations NeurIPS 2021 Kiran K. Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

To cope with such data scarcity, meta-representation learning methods train across many related tasks to find a shared (lower-dimensional) representation of the data where all tasks can be solved accurately.

Few-Shot Learning Representation Learning

Gradient flows on graphons: existence, convergence, continuity equations

no code implementations18 Nov 2021 Sewoong Oh, Soumik Pal, Raghav Somani, Raghavendra Tripathi

Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems.

Differential privacy and robust statistics in high dimensions

no code implementations12 Nov 2021 Xiyang Liu, Weihao Kong, Sewoong Oh

The key insight is that if we design an exponential mechanism that accesses the data only via one-dimensional robust statistics, then the resulting local sensitivity can be dramatically reduced.

Vocal Bursts Intensity Prediction

Gradient Inversion with Generative Image Prior

1 code implementation NeurIPS 2021 Jinwoo Jeon, Jaechang Kim, Kangwook Lee, Sewoong Oh, Jungseul Ok

Federated Learning (FL) is a distributed learning framework, in which the local data never leaves clients devices to preserve privacy, and the server trains models on the data via accessing only the gradients of those local data.

Federated Learning

KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learning

1 code implementation29 Aug 2021 Ashok Vardhan Makkuva, Xiyang Liu, Mohammad Vahid Jamali, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath

In this paper, we construct KO codes, a computationaly efficient family of deep-learning driven (encoder, decoder) pairs that outperform the state-of-the-art reliability performance on the standardized AWGN channel.

Benchmarking Decoder

FedChain: Chained Algorithms for Near-Optimal Communication Cost in Federated Learning

no code implementations ICLR 2022 Charlie Hou, Kiran K. Thekumparampil, Giulia Fanti, Sewoong Oh

We propose FedChain, an algorithmic framework that combines the strengths of local methods and global methods to achieve fast convergence in terms of R while leveraging the similarity between clients.

Federated Learning Image Classification

Sample Efficient Linear Meta-Learning by Alternating Minimization

no code implementations18 May 2021 Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

We show that, for a constant subspace dimension MLLAM obtains nearly-optimal estimation error, despite requiring only $\Omega(\log d)$ samples per task.

Meta-Learning

SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

1 code implementation22 Apr 2021 Jonathan Hayase, Weihao Kong, Raghav Somani, Sewoong Oh

There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones.

Robust and Differentially Private Mean Estimation

1 code implementation NeurIPS 2021 Xiyang Liu, Weihao Kong, Sham Kakade, Sewoong Oh

In statistical learning and analysis from shared data, which is increasingly widely adopted in platforms such as federated learning and meta-learning, there are two major concerns: privacy and robustness.

Federated Learning Meta-Learning

Efficient Algorithms for Federated Saddle Point Optimization

no code implementations12 Feb 2021 Charlie Hou, Kiran K. Thekumparampil, Giulia Fanti, Sewoong Oh

Our goal is to design an algorithm that can harness the benefit of similarity in the clients while recovering the Minibatch Mirror-prox performance under arbitrary heterogeneity (up to log factors).

Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise

no code implementations ICLR 2022 Xingyu Wang, Sewoong Oh, Chang-Han Rhee

The empirical success of deep learning is often attributed to SGD's mysterious ability to avoid sharp local minima in the loss landscape, as sharp minima are known to lead to poor generalization.

Deep Learning

Reed-Muller Subcodes: Machine Learning-Aided Design of Efficient Soft Recursive Decoding

no code implementations2 Feb 2021 Mohammad Vahid Jamali, Xiyang Liu, Ashok Vardhan Makkuva, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath

To lower the complexity of our decoding algorithm, referred to as subRPA in this paper, we investigate different ways for pruning the projections.

Information Theory Information Theory

Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method

no code implementations NeurIPS 2020 Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

Further, instead of a PO if we only have a linear minimization oracle (LMO, a la Frank-Wolfe) to access the constraint set, an extension of our method, MOLES, finds a feasible $\epsilon$-suboptimal solution using $O(\epsilon^{-2})$ LMO calls and FO calls---both match known lower bounds, resolving a question left open since White (1993).

Deepcode and Modulo-SK are Designed for Different Settings

no code implementations18 Aug 2020 Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

DeepCode is designed and evaluated for the AWGN channel with (potentially delayed) uncoded output feedback.

Robust Meta-learning for Mixed Linear Regression with Small Batches

no code implementations NeurIPS 2020 Weihao Kong, Raghav Somani, Sham Kakade, Sewoong Oh

Together, this approach is robust against outliers and achieves a graceful statistical trade-off; the lack of $\Omega(k^{1/2})$-size tasks can be compensated for with smaller tasks, which can now be as small as $O(\log k)$.

Meta-Learning regression

Meta-learning for mixed linear regression

no code implementations ICML 2020 Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh

In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data.

Meta-Learning regression +1

Minimax Optimal Estimation of Approximate Differential Privacy on Neighboring Databases

1 code implementation NeurIPS 2019 Xiyang Liu, Sewoong Oh

We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required.

Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels

1 code implementation NeurIPS 2019 Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communications.

Decoder

Towards Principled Objectives for Contrastive Disentanglement

no code implementations25 Sep 2019 Anwesa Choudhuri, Ashok Vardhan Makkuva, Ranvir Rana, Sewoong Oh, Girish Chowdhary, Alexander Schwing

%In fact, contrastive disentanglement and unsupervised recovery are often combined in that we seek additional variations that exhibit salient factors/properties.

Disentanglement

Optimal transport mapping via input convex neural networks

2 code implementations ICML 2020 Ashok Vardhan Makkuva, Amirhossein Taghvaei, Sewoong Oh, Jason D. Lee

Building upon recent advances in the field of input convex neural networks, we propose a new framework where the gradient of one convex function represents the optimal transport mapping.

Efficient Algorithms for Smooth Minimax Optimization

2 code implementations NeurIPS 2019 Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh

This paper studies first order methods for solving smooth minimax optimization problems $\min_x \max_y g(x, y)$ where $g(\cdot,\cdot)$ is smooth and $g(x,\cdot)$ is concave for each $x$.

InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs

1 code implementation14 Jun 2019 Zinan Lin, Kiran Koshy Thekumparampil, Giulia Fanti, Sewoong Oh

Disentangled generative models map a latent code vector to a target space, while enforcing that a subset of the learned latent codes are interpretable and associated with distinct properties of the target distribution.

Disentanglement Model Selection

Robust conditional GANs under missing or uncertain labels

no code implementations9 Jun 2019 Kiran Koshy Thekumparampil, Sewoong Oh, Ashish Khetan

Matching the performance of conditional Generative Adversarial Networks with little supervision is an important task, especially in venturing into new domains.

Learning in Gated Neural Networks

no code implementations6 Jun 2019 Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath

Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks.

Minimax Rates of Estimating Approximate Differential Privacy

2 code implementations24 May 2019 Xiyang Liu, Sewoong Oh

We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required.

Information Theory Information Theory

DeepTurbo: Deep Turbo Decoder

1 code implementation6 Mar 2019 Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

We focus on Turbo codes and propose DeepTurbo, a novel deep learning based architecture for Turbo decoding.

Decoder

Number of Connected Components in a Graph: Estimation via Counting Patterns

no code implementations1 Dec 2018 Ashish Khetan, Harshay Shah, Sewoong Oh

This representation is crucial in introducing a novel estimator for the number of connected components for general graphs, under the knowledge of the spectral gap of the original graph.

LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks

1 code implementation30 Nov 2018 Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

Designing channel codes under low-latency constraints is one of the most demanding requirements in 5G standards.

Decoder

Robustness of Conditional GANs to Noisy Labels

2 code implementations NeurIPS 2018 Kiran Koshy Thekumparampil, Ashish Khetan, Zinan Lin, Sewoong Oh

When the distribution of the noise is known, we introduce a novel architecture which we call Robust Conditional GAN (RCGAN).

Rate Distortion For Model Compression: From Theory To Practice

no code implementations9 Oct 2018 Weihao Gao, Yu-Han Liu, Chong Wang, Sewoong Oh

Theoretically, we prove that the proposed scheme is optimal for compressing one-hidden-layer ReLU neural networks.

Data Compression Model Compression +1

Learning One-hidden-layer Neural Networks under General Input Distributions

no code implementations9 Oct 2018 Weihao Gao, Ashok Vardhan Makkuva, Sewoong Oh, Pramod Viswanath

Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points.

Deepcode: Feedback Codes via Deep Learning

1 code implementation NeurIPS 2018 Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

The design of codes for communicating reliably over a statistically well defined channel is an important endeavor involving deep mathematical research and wide-ranging practical applications.

Deep Learning

Communication Algorithms via Deep Learning

3 code implementations ICLR 2018 Hyeji Kim, Yihan Jiang, Ranvir Rana, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

We show that creatively designed and trained RNN architectures can decode well known sequential codes such as the convolutional and turbo codes with close to optimal performance on the additive white Gaussian noise (AWGN) channel, which itself is achieved by breakthrough algorithms of our times (Viterbi and BCJR decoders, representing dynamic programing and forward-backward algorithms).

Deep Learning Ingenuity

Attention-based Graph Neural Network for Semi-supervised Learning

1 code implementation ICLR 2018 Kiran K. Thekumparampil, Chong Wang, Sewoong Oh, Li-Jia Li

Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches.

Graph Neural Network Graph Regression

Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms

no code implementations21 Feb 2018 Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath

Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters.

Ensemble Learning

PacGAN: The power of two samples in generative adversarial networks

7 code implementations NeurIPS 2018 Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh

Generative adversarial networks (GANs) are innovative techniques for learning generative models of complex data distributions from samples.

Diversity Two-sample testing +1

Optimal Sample Complexity of M-wise Data for Top-K Ranking

no code implementations NeurIPS 2017 Minje Jang, Sunghyun Kim, Changho Suh, Sewoong Oh

As our result, we characterize the minimax optimality on the sample size for top-K ranking.

Matrix Norm Estimation from a Few Entries

1 code implementation NeurIPS 2017 Ashish Khetan, Sewoong Oh

This paper focuses on the technical challenges in accurately estimating the Schatten norms from a sampling of a matrix.

Collaborative Filtering Matrix Completion

Estimating Mutual Information for Discrete-Continuous Mixtures

1 code implementation NeurIPS 2017 Weihao Gao, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

We provide numerical experiments suggesting superiority of the proposed estimator compared to other heuristics of adding small continuous noise to all the samples and applying standard estimators tailored for purely continuous variables, and quantizing the samples and applying standard estimators tailored for purely discrete variables.

Clustering Mutual Information Estimation

Learning from Comparisons and Choices

no code implementations24 Apr 2017 Sahand Negahban, Sewoong Oh, Kiran K. Thekumparampil, Jiaming Xu

This also allows one to compute similarities among users and items to be used for categorization and search.

Marketing Recommendation Systems

Spectrum Estimation from a Few Entries

no code implementations18 Mar 2017 Ashish Khetan, Sewoong Oh

We propose first estimating the Schatten $k$-norms of a matrix, and then applying Chebyshev approximation to the spectral sum function or applying moment matching in Wasserstein distance to recover the singular values.

Collaborative Filtering Matrix Completion

Iterative Bayesian Learning for Crowdsourced Regression

no code implementations28 Feb 2017 Jungseul Ok, Sewoong Oh, Yunhun Jang, Jinwoo Shin, Yung Yi

Crowdsourcing platforms emerged as popular venues for purchasing human intelligence at low cost for large volume of tasks.

regression

Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

no code implementations NeurIPS 2016 Weihao Gao, Sewoong Oh, Pramod Viswanath

In this paper, we combine both these approaches to design new estimators of entropy and mutual information that outperform state of the art methods.

Computational and Statistical Tradeoffs in Learning to Rank

no code implementations NeurIPS 2016 Ashish Khetan, Sewoong Oh

For massive and heterogeneous modern datasets, it is of fundamental interest to provide guarantees on the accuracy of estimation when computational resources are limited.

Learning-To-Rank

Demystifying Fixed k-Nearest Neighbor Information Estimators

1 code implementation11 Apr 2016 Weihao Gao, Sewoong Oh, Pramod Viswanath

In this paper we demonstrate that the estimator is consistent and also identify an upper bound on the rate of convergence of the bias as a function of number of samples.

Top-$K$ Ranking from Pairwise Comparisons: When Spectral Ranking is Optimal

no code implementations14 Mar 2016 Minje Jang, Sunghyun Kim, Changho Suh, Sewoong Oh

First, in a general comparison model where item pairs to compare are given a priori, we attain an upper and lower bound on the sample size for reliable recovery of the top-$K$ ranked items.

Achieving Budget-optimality with Adaptive Schemes in Crowdsourcing

no code implementations NeurIPS 2016 Ashish Khetan, Sewoong Oh

Under this generalized Dawid-Skene model, we characterize the fundamental trade-off between budget and accuracy.

Conditional Dependence via Shannon Capacity: Axioms, Estimators and Applications

no code implementations10 Feb 2016 Weihao Gao, Sreeram Kannan, Sewoong Oh, Pramod Viswanath

We conduct an axiomatic study of the problem of estimating the strength of a known causal relationship between a pair of variables.

Data-driven Rank Breaking for Efficient Rank Aggregation

no code implementations21 Jan 2016 Ashish Khetan, Sewoong Oh

Rank aggregation systems collect ordinal preferences from individuals to produce a global ranking that represents the social preference.

Secure Multi-party Differential Privacy

no code implementations NeurIPS 2015 Peter Kairouz, Sewoong Oh, Pramod Viswanath

In this setting, each party is interested in computing a function on its private bit and all the other parties' bits.

Collaboratively Learning Preferences from Ordinal Data

no code implementations NeurIPS 2015 Sewoong Oh, Kiran K. Thekumparampil, Jiaming Xu

In order to predict the preferences, we want to learn the underlying model from noisy observations of the low-rank matrix, collected as revealed preferences in various forms of ordinal data.

Collaborative Ranking Management +1

Spy vs. Spy: Rumor Source Obfuscation

no code implementations29 Dec 2014 Giulia Fanti, Peter Kairouz, Sewoong Oh, Pramod Viswanath

Whether for fear of judgment or personal endangerment, it is crucial to keep anonymous the identity of the user who initially posted a sensitive message.

Learning Mixed Multinomial Logit Model from Ordinal Data

no code implementations NeurIPS 2014 Sewoong Oh, Devavrat Shah

In case of single MNL models (no mixture), computationally and statistically tractable learning from pair-wise comparisons is feasible.

Management Tensor Decomposition

Minimax-optimal Inference from Partial Rankings

no code implementations NeurIPS 2014 Bruce Hajek, Sewoong Oh, Jiaming Xu

For a given assignment of items to users, we first derive an oracle lower bound of the estimation error that holds even for the more general Thurstone models.

Provable Tensor Factorization with Missing Data

1 code implementation NeurIPS 2014 Prateek Jain, Sewoong Oh

We show that under certain standard assumptions, our method can recover a three-mode $n\times n\times n$ dimensional rank-$r$ tensor exactly from $O(n^{3/2} r^5 \log^4 n)$ randomly sampled entries.

Learning Mixtures of Discrete Product Distributions using Spectral Decompositions

no code implementations12 Nov 2013 Prateek Jain, Sewoong Oh

The main challenge in learning mixtures of discrete product distributions is that these low-rank tensors cannot be obtained directly from the sample moments.

Matrix Completion Recommendation Systems

The Composition Theorem for Differential Privacy

no code implementations4 Nov 2013 Peter Kairouz, Sewoong Oh, Pramod Viswanath

Sequential querying of differentially private mechanisms degrades the overall privacy level.

Data Structures and Algorithms Cryptography and Security Information Theory Information Theory

Iterative ranking from pair-wise comparisons

no code implementations NeurIPS 2012 Sahand Negahban, Sewoong Oh, Devavrat Shah

In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e. g. player’s rating) is of interest to understanding the intensity of the preferences.

Rank Centrality: Ranking from Pair-wise Comparisons

no code implementations8 Sep 2012 Sahand Negahban, Sewoong Oh, Devavrat Shah

To study the efficacy of the algorithm, we consider the popular Bradley-Terry-Luce (BTL) model (equivalent to the Multinomial Logit (MNL) for pair-wise comparisons) in which each object has an associated score which determines the probabilistic outcomes of pair-wise comparisons between objects.

Iterative Learning for Reliable Crowdsourcing Systems

no code implementations NeurIPS 2011 David R. Karger, Sewoong Oh, Devavrat Shah

Crowdsourcing systems, in which tasks are electronically distributed to numerous ``information piece-workers'', have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading.

Image Classification Optical Character Recognition +1

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

no code implementations17 Oct 2011 David R. Karger, Sewoong Oh, Devavrat Shah

Further, we compare our approach with a more general class of algorithms which can dynamically assign tasks.

Image Classification Optical Character Recognition +1

Matrix Completion from Noisy Entries

1 code implementation NeurIPS 2009 Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries.

Collaborative Filtering Matrix Completion

Matrix Completion from a Few Entries

1 code implementation20 Jan 2009 Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh

In the process of proving these statements, we obtain a generalization of a celebrated result by Friedman-Kahn-Szemeredi and Feige-Ofek on the spectrum of sparse random matrices.

Matrix Completion

Cannot find the paper you are looking for? You can Submit a new open access paper.