1 code implementation • ICML 2020 • Zinan Lin, Kiran Thekumparampil, Giulia Fanti, Sewoong Oh
This contrastive regularizer is inspired by a natural notion of disentanglement: latent traversal.
no code implementations • 1 Nov 2024 • Zerui Cheng, Edoardo Contente, Ben Finch, Oleg Golev, Jonathan Hayase, Andrew Miller, Niusha Moshrefi, Anshul Nasery, Sandeep Nailwal, Sewoong Oh, Himanshu Tyagi, Pramod Viswanath
Artificial Intelligence (AI) has steadily improved across a wide range of tasks.
no code implementations • 21 Aug 2024 • Wei-Ning Chen, Peter Kairouz, Sewoong Oh, Zheng Xu
In this paper, we investigate potential randomization approaches that can complement current practices of input-based methods (such as licensing data and prompt filtering) and output-based methods (such as recitation checker, license checker, and model-based similarity score) for copyright protection.
no code implementations • 8 Aug 2024 • Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston, Luke Zettlemoyer, Xian Li
We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs).
1 code implementation • 23 Jul 2024 • Jonathan Hayase, Alisa Liu, Yejin Choi, Sewoong Oh, Noah A. Smith
Our key insight is that the ordered list of merge rules learned by a BPE tokenizer naturally reveals information about the token frequencies in its training data.
no code implementations • 5 Jul 2024 • Divyansh Pareek, Simon S. Du, Sewoong Oh
Self-Distillation is a special type of knowledge distillation where the student model has the same architecture as the teacher model.
no code implementations • 2 Jul 2024 • Anshul Nasery, Jonathan Hayase, Pang Wei Koh, Sewoong Oh
Furthermore, the final merged model is typically restricted to be of the same size as the original models.
2 code implementations • 17 Jun 2024 • Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, HANLIN ZHANG, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models.
no code implementations • 27 May 2024 • Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna
By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set.
no code implementations • 8 May 2024 • Eugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel Ramage
The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns.
no code implementations • 2 May 2024 • Wei-Ning Chen, Berivan Isik, Peter Kairouz, Albert No, Sewoong Oh, Zheng Xu
We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e. g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers.
no code implementations • 23 Apr 2024 • Gavin Brown, Jonathan Hayase, Samuel Hopkins, Weihao Kong, Xiyang Liu, Sewoong Oh, Juan C. Perdomo, Adam Smith
We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X^\top X$, where $X$ is the design matrix.
no code implementations • 29 Feb 2024 • Shuqi Ke, Charlie Hou, Giulia Fanti, Sewoong Oh
We provide theoretical insights into the convergence of DP fine-tuning within an overparameterized neural network and establish a utility curve that determines the allocation of privacy budget between linear probing and full fine-tuning.
1 code implementation • 21 Feb 2024 • Da Yu, Peter Kairouz, Sewoong Oh, Zheng Xu
Service providers of large language model (LLM) applications collect user instructions in the wild and use them in further aligning LLMs with users' intentions.
1 code implementation • 14 Feb 2024 • S Ashwin Hebbar, Sravan Kumar Ankireddy, Hyeji Kim, Sewoong Oh, Pramod Viswanath
Progress in designing channel codes has been driven by human ingenuity and, fittingly, has been sporadic.
1 code implementation • 14 Oct 2023 • Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy.
no code implementations • 6 Oct 2023 • Liam Collins, Shanshan Wu, Sewoong Oh, Khe Chai Sim
In many applications of federated learning (FL), clients desire models that are personalized using their local data, yet are also robust in the sense that they retain general global knowledge.
1 code implementation • 20 Jul 2023 • Enayat Ullah, Christopher A. Choquette-Choo, Peter Kairouz, Sewoong Oh
We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates.
no code implementations • 20 May 2023 • Boxin Wang, Yibo Jacky Zhang, Yuan Cao, Bo Li, H. Brendan McMahan, Sewoong Oh, Zheng Xu, Manzil Zaheer
We study (differentially) private federated learning (FL) of language models.
3 code implementations • NeurIPS 2023 • Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt
Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms.
no code implementations • 19 Feb 2023 • Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, Lun Wang
To explain this phenomenon, we hypothesize that the non-convex loss landscape of a model training necessitates an optimization algorithm to go through two phases.
1 code implementation • 6 Feb 2023 • Galen Andrew, Peter Kairouz, Sewoong Oh, Alina Oprea, H. Brendan McMahan, Vinith M. Suriyakumar
Privacy estimation techniques for differentially private (DP) algorithms are useful for comparing against analytical bounds, or to empirically measure privacy loss in settings where known analytical bounds are not tight.
no code implementations • 30 Jan 2023 • Xiyang Liu, Prateek Jain, Weihao Kong, Sewoong Oh, Arun Sai Suggala
Under label-corruption, this is the first efficient linear regression algorithm to guarantee both $(\varepsilon,\delta)$-DP and robustness.
no code implementations • 16 Jan 2023 • Mohammad Vahid Jamali, Xiyang Liu, Ashok Vardhan Makkuva, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath
Next, we derive the soft-decision based version of our algorithm, called soft-subRPA, that not only improves upon the performance of subRPA but also enables a differentiable decoding algorithm.
2 code implementations • bioRxiv 2023 • Melih Yilmaz, William E. Fondrie, Wout Bittremieux, Rowan Nelson, Varun Ananth, Sewoong Oh, William Stafford Noble
A fundamental challenge for any mass spectrometry-based proteomics experiment is the identification of the peptide that generated each acquired tandem mass spectrum.
1 code implementation • 30 Dec 2022 • Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui
We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images.
1 code implementation • CVPR 2023 • Zheng Xu, Maxwell Collins, Yuxiao Wang, Liviu Panait, Sewoong Oh, Sean Augenstein, Ting Liu, Florian Schroff, H. Brendan McMahan
Small on-device models have been successfully trained with user-level differential privacy (DP) for next word prediction and image classification tasks in the past.
1 code implementation • 14 Oct 2022 • Matt Jordan, Jonathan Hayase, Alexandros G. Dimakis, Sewoong Oh
Neural network verification aims to provide provable bounds for the output of a neural network for a given input range.
1 code implementation • 12 Oct 2022 • Jonathan Hayase, Sewoong Oh
In a backdoor attack, an attacker injects corrupted examples into the training set.
no code implementations • 2 Oct 2022 • Zaid Harchaoui, Sewoong Oh, Soumik Pal, Raghav Somani, Raghavendra Tripathi
We consider stochastic gradient descents on the space of large symmetric matrices of suitable functions that are invariant under permuting the rows and columns using the same permutation.
1 code implementation • 1 Oct 2022 • S Ashwin Hebbar, Viraj Nadkarni, Ashok Vardhan Makkuva, Suma Bhat, Sewoong Oh, Pramod Viswanath
We design a principled curriculum, guided by information-theoretic insights, to train CRISP and show that it outperforms the successive-cancellation (SC) decoder and attains near-optimal reliability performance on the Polar(32, 16) and Polar(64, 22) codes.
1 code implementation • 10 Aug 2022 • Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt
Web-crawled datasets have enabled remarkable generalization capabilities in recent image-text models such as CLIP (Contrastive Language-Image pre-training) or Flamingo, but little is known about the dataset creation processes.
no code implementations • 1 Jun 2022 • Liang Zhang, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
We provide a general framework for solving differentially private stochastic minimax optimization (DP-SMO) problems, which enables the practitioners to bring their own base optimization algorithm and use it as a black-box to obtain the near-optimal privacy-loss trade-off.
no code implementations • 27 May 2022 • Xiyang Liu, Weihao Kong, Prateek Jain, Sewoong Oh
For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=\tilde O(d)$.
1 code implementation • 24 May 2022 • Shuaiqi Wang, Jonathan Hayase, Giulia Fanti, Sewoong Oh
We propose shadow learning, a framework for defending against backdoor attacks in the FL setting under long-range training.
no code implementations • 7 Feb 2022 • Liam Collins, Aryan Mokhtari, Sewoong Oh, Sanjay Shakkottai
Recent empirical evidence has driven conventional wisdom to believe that gradient-based meta-learning (GBML) methods perform well at few-shot learning because they learn an expressive data representation that is shared across tasks.
no code implementations • 19 Jan 2022 • Kiran Koshy Thekumparampil, Niao He, Sewoong Oh
We also provide a direct single-loop algorithm, using the LPD method, that achieves the iteration complexity of $O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{\mu_y \varepsilon}} + \sqrt{\frac{L_y}{\varepsilon}})$.
no code implementations • NeurIPS 2021 • Kiran K. Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh
To cope with such data scarcity, meta-representation learning methods train across many related tasks to find a shared (lower-dimensional) representation of the data where all tasks can be solved accurately.
no code implementations • 18 Nov 2021 • Sewoong Oh, Soumik Pal, Raghav Somani, Raghavendra Tripathi
Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems.
no code implementations • 12 Nov 2021 • Xiyang Liu, Weihao Kong, Sewoong Oh
The key insight is that if we design an exponential mechanism that accesses the data only via one-dimensional robust statistics, then the resulting local sensitivity can be dramatically reduced.
1 code implementation • NeurIPS 2021 • Jinwoo Jeon, Jaechang Kim, Kangwook Lee, Sewoong Oh, Jungseul Ok
Federated Learning (FL) is a distributed learning framework, in which the local data never leaves clients devices to preserve privacy, and the server trains models on the data via accessing only the gradients of those local data.
1 code implementation • 29 Aug 2021 • Ashok Vardhan Makkuva, Xiyang Liu, Mohammad Vahid Jamali, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath
In this paper, we construct KO codes, a computationaly efficient family of deep-learning driven (encoder, decoder) pairs that outperform the state-of-the-art reliability performance on the standardized AWGN channel.
no code implementations • ICLR 2022 • Charlie Hou, Kiran K. Thekumparampil, Giulia Fanti, Sewoong Oh
We propose FedChain, an algorithmic framework that combines the strengths of local methods and global methods to achieve fast convergence in terms of R while leveraging the similarity between clients.
1 code implementation • NeurIPS 2021 • Lang Liu, Krishna Pillutla, Sean Welleck, Sewoong Oh, Yejin Choi, Zaid Harchaoui
The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance.
no code implementations • 18 May 2021 • Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh
We show that, for a constant subspace dimension MLLAM obtains nearly-optimal estimation error, despite requiring only $\Omega(\log d)$ samples per task.
1 code implementation • 22 Apr 2021 • Jonathan Hayase, Weihao Kong, Raghav Somani, Sewoong Oh
There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones.
1 code implementation • NeurIPS 2021 • Xiyang Liu, Weihao Kong, Sham Kakade, Sewoong Oh
In statistical learning and analysis from shared data, which is increasingly widely adopted in platforms such as federated learning and meta-learning, there are two major concerns: privacy and robustness.
no code implementations • 12 Feb 2021 • Charlie Hou, Kiran K. Thekumparampil, Giulia Fanti, Sewoong Oh
Our goal is to design an algorithm that can harness the benefit of similarity in the clients while recovering the Minibatch Mirror-prox performance under arbitrary heterogeneity (up to log factors).
no code implementations • ICLR 2022 • Xingyu Wang, Sewoong Oh, Chang-Han Rhee
The empirical success of deep learning is often attributed to SGD's mysterious ability to avoid sharp local minima in the loss landscape, as sharp minima are known to lead to poor generalization.
no code implementations • 2 Feb 2021 • Mohammad Vahid Jamali, Xiyang Liu, Ashok Vardhan Makkuva, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath
To lower the complexity of our decoding algorithm, referred to as subRPA in this paper, we investigate different ways for pruning the projections.
Information Theory Information Theory
no code implementations • NeurIPS 2020 • Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh
Further, instead of a PO if we only have a linear minimization oracle (LMO, a la Frank-Wolfe) to access the constraint set, an extension of our method, MOLES, finds a feasible $\epsilon$-suboptimal solution using $O(\epsilon^{-2})$ LMO calls and FO calls---both match known lower bounds, resolving a question left open since White (1993).
no code implementations • 18 Aug 2020 • Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
DeepCode is designed and evaluated for the AWGN channel with (potentially delayed) uncoded output feedback.
no code implementations • NeurIPS 2020 • Weihao Kong, Raghav Somani, Sham Kakade, Sewoong Oh
Together, this approach is robust against outliers and achieves a graceful statistical trade-off; the lack of $\Omega(k^{1/2})$-size tasks can be compensated for with smaller tasks, which can now be as small as $O(\log k)$.
no code implementations • ICML 2020 • Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh
In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labeled data.
1 code implementation • NeurIPS 2019 • Xiyang Liu, Sewoong Oh
We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required.
1 code implementation • NeurIPS 2019 • Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communications.
no code implementations • 25 Sep 2019 • Anwesa Choudhuri, Ashok Vardhan Makkuva, Ranvir Rana, Sewoong Oh, Girish Chowdhary, Alexander Schwing
%In fact, contrastive disentanglement and unsupervised recovery are often combined in that we seek additional variations that exhibit salient factors/properties.
2 code implementations • ICML 2020 • Ashok Vardhan Makkuva, Amirhossein Taghvaei, Sewoong Oh, Jason D. Lee
Building upon recent advances in the field of input convex neural networks, we propose a new framework where the gradient of one convex function represents the optimal transport mapping.
2 code implementations • NeurIPS 2019 • Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh
This paper studies first order methods for solving smooth minimax optimization problems $\min_x \max_y g(x, y)$ where $g(\cdot,\cdot)$ is smooth and $g(x,\cdot)$ is concave for each $x$.
1 code implementation • 14 Jun 2019 • Zinan Lin, Kiran Koshy Thekumparampil, Giulia Fanti, Sewoong Oh
Disentangled generative models map a latent code vector to a target space, while enforcing that a subset of the learned latent codes are interpretable and associated with distinct properties of the target distribution.
no code implementations • 9 Jun 2019 • Kiran Koshy Thekumparampil, Sewoong Oh, Ashish Khetan
Matching the performance of conditional Generative Adversarial Networks with little supervision is an important task, especially in venturing into new domains.
no code implementations • 6 Jun 2019 • Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath
Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks.
2 code implementations • 24 May 2019 • Xiyang Liu, Sewoong Oh
We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required.
Information Theory Information Theory
1 code implementation • 6 Mar 2019 • Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
We focus on Turbo codes and propose DeepTurbo, a novel deep learning based architecture for Turbo decoding.
no code implementations • 1 Dec 2018 • Ashish Khetan, Harshay Shah, Sewoong Oh
This representation is crucial in introducing a novel estimator for the number of connected components for general graphs, under the knowledge of the spectral gap of the original graph.
1 code implementation • 30 Nov 2018 • Yihan Jiang, Hyeji Kim, Himanshu Asnani, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
Designing channel codes under low-latency constraints is one of the most demanding requirements in 5G standards.
2 code implementations • NeurIPS 2018 • Kiran Koshy Thekumparampil, Ashish Khetan, Zinan Lin, Sewoong Oh
When the distribution of the noise is known, we introduce a novel architecture which we call Robust Conditional GAN (RCGAN).
no code implementations • 9 Oct 2018 • Weihao Gao, Yu-Han Liu, Chong Wang, Sewoong Oh
Theoretically, we prove that the proposed scheme is optimal for compressing one-hidden-layer ReLU neural networks.
no code implementations • 9 Oct 2018 • Weihao Gao, Ashok Vardhan Makkuva, Sewoong Oh, Pramod Viswanath
Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points.
1 code implementation • NeurIPS 2018 • Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
The design of codes for communicating reliably over a statistically well defined channel is an important endeavor involving deep mathematical research and wide-ranging practical applications.
3 code implementations • ICLR 2018 • Hyeji Kim, Yihan Jiang, Ranvir Rana, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
We show that creatively designed and trained RNN architectures can decode well known sequential codes such as the convolutional and turbo codes with close to optimal performance on the additive white Gaussian noise (AWGN) channel, which itself is achieved by breakthrough algorithms of our times (Viterbi and BCJR decoders, representing dynamic programing and forward-backward algorithms).
1 code implementation • ICLR 2018 • Kiran K. Thekumparampil, Chong Wang, Sewoong Oh, Li-Jia Li
Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches.
Ranked #10 on Graph Regression on Lipophilicity
no code implementations • 21 Feb 2018 • Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath
Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters.
7 code implementations • NeurIPS 2018 • Zinan Lin, Ashish Khetan, Giulia Fanti, Sewoong Oh
Generative adversarial networks (GANs) are innovative techniques for learning generative models of complex data distributions from samples.
no code implementations • NeurIPS 2017 • Minje Jang, Sunghyun Kim, Changho Suh, Sewoong Oh
As our result, we characterize the minimax optimality on the sample size for top-K ranking.
1 code implementation • NeurIPS 2017 • Ashish Khetan, Sewoong Oh
This paper focuses on the technical challenges in accurately estimating the Schatten norms from a sampling of a matrix.
1 code implementation • NeurIPS 2017 • Weihao Gao, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
We provide numerical experiments suggesting superiority of the proposed estimator compared to other heuristics of adding small continuous noise to all the samples and applying standard estimators tailored for purely continuous variables, and quantizing the samples and applying standard estimators tailored for purely discrete variables.
no code implementations • NeurIPS 2017 • Hyeji Kim, Weihao Gao, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest.
no code implementations • 24 Apr 2017 • Sahand Negahban, Sewoong Oh, Kiran K. Thekumparampil, Jiaming Xu
This also allows one to compute similarities among users and items to be used for categorization and search.
no code implementations • 18 Mar 2017 • Ashish Khetan, Sewoong Oh
We propose first estimating the Schatten $k$-norms of a matrix, and then applying Chebyshev approximation to the spectral sum function or applying moment matching in Wasserstein distance to recover the singular values.
no code implementations • 28 Feb 2017 • Jungseul Ok, Sewoong Oh, Yunhun Jang, Jinwoo Shin, Yung Yi
Crowdsourcing platforms emerged as popular venues for purchasing human intelligence at low cost for large volume of tasks.
no code implementations • NeurIPS 2016 • Weihao Gao, Sewoong Oh, Pramod Viswanath
In this paper, we combine both these approaches to design new estimators of entropy and mutual information that outperform state of the art methods.
no code implementations • NeurIPS 2016 • Ashish Khetan, Sewoong Oh
For massive and heterogeneous modern datasets, it is of fundamental interest to provide guarantees on the accuracy of estimation when computational resources are limited.
1 code implementation • 11 Apr 2016 • Weihao Gao, Sewoong Oh, Pramod Viswanath
In this paper we demonstrate that the estimator is consistent and also identify an upper bound on the rate of convergence of the bias as a function of number of samples.
no code implementations • 14 Mar 2016 • Minje Jang, Sunghyun Kim, Changho Suh, Sewoong Oh
First, in a general comparison model where item pairs to compare are given a priori, we attain an upper and lower bound on the sample size for reliable recovery of the top-$K$ ranked items.
no code implementations • 11 Feb 2016 • Jungseul Ok, Sewoong Oh, Jinwoo Shin, Yung Yi
Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid workers.
no code implementations • NeurIPS 2016 • Ashish Khetan, Sewoong Oh
Under this generalized Dawid-Skene model, we characterize the fundamental trade-off between budget and accuracy.
no code implementations • 10 Feb 2016 • Weihao Gao, Sreeram Kannan, Sewoong Oh, Pramod Viswanath
We conduct an axiomatic study of the problem of estimating the strength of a known causal relationship between a pair of variables.
no code implementations • 21 Jan 2016 • Ashish Khetan, Sewoong Oh
Rank aggregation systems collect ordinal preferences from individuals to produce a global ranking that represents the social preference.
no code implementations • NeurIPS 2015 • Peter Kairouz, Sewoong Oh, Pramod Viswanath
In this setting, each party is interested in computing a function on its private bit and all the other parties' bits.
no code implementations • NeurIPS 2015 • Sewoong Oh, Kiran K. Thekumparampil, Jiaming Xu
In order to predict the preferences, we want to learn the underlying model from noisy observations of the low-rank matrix, collected as revealed preferences in various forms of ordinal data.
no code implementations • 29 Dec 2014 • Giulia Fanti, Peter Kairouz, Sewoong Oh, Pramod Viswanath
Whether for fear of judgment or personal endangerment, it is crucial to keep anonymous the identity of the user who initially posted a sensitive message.
no code implementations • NeurIPS 2014 • Sewoong Oh, Devavrat Shah
In case of single MNL models (no mixture), computationally and statistically tractable learning from pair-wise comparisons is feasible.
no code implementations • NeurIPS 2014 • Bruce Hajek, Sewoong Oh, Jiaming Xu
For a given assignment of items to users, we first derive an oracle lower bound of the estimation error that holds even for the more general Thurstone models.
1 code implementation • NeurIPS 2014 • Prateek Jain, Sewoong Oh
We show that under certain standard assumptions, our method can recover a three-mode $n\times n\times n$ dimensional rank-$r$ tensor exactly from $O(n^{3/2} r^5 \log^4 n)$ randomly sampled entries.
no code implementations • 12 Nov 2013 • Prateek Jain, Sewoong Oh
The main challenge in learning mixtures of discrete product distributions is that these low-rank tensors cannot be obtained directly from the sample moments.
no code implementations • 4 Nov 2013 • Peter Kairouz, Sewoong Oh, Pramod Viswanath
Sequential querying of differentially private mechanisms degrades the overall privacy level.
Data Structures and Algorithms Cryptography and Security Information Theory Information Theory
no code implementations • NeurIPS 2012 • Sahand Negahban, Sewoong Oh, Devavrat Shah
In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e. g. player’s rating) is of interest to understanding the intensity of the preferences.
no code implementations • 8 Sep 2012 • Sahand Negahban, Sewoong Oh, Devavrat Shah
To study the efficacy of the algorithm, we consider the popular Bradley-Terry-Luce (BTL) model (equivalent to the Multinomial Logit (MNL) for pair-wise comparisons) in which each object has an associated score which determines the probabilistic outcomes of pair-wise comparisons between objects.
no code implementations • NeurIPS 2011 • David R. Karger, Sewoong Oh, Devavrat Shah
Crowdsourcing systems, in which tasks are electronically distributed to numerous ``information piece-workers'', have emerged as an effective paradigm for human-powered solving of large scale problems in domains such as image classification, data entry, optical character recognition, recommendation, and proofreading.
no code implementations • 17 Oct 2011 • David R. Karger, Sewoong Oh, Devavrat Shah
Further, we compare our approach with a more general class of algorithms which can dynamically assign tasks.
1 code implementation • 27 Oct 2009 • Raghunandan H. Keshavan, Sewoong Oh
We consider the problem of reconstructing a low-rank matrix from a small subset of its entries.
1 code implementation • NeurIPS 2009 • Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh
Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries.
1 code implementation • 20 Jan 2009 • Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh
In the process of proving these statements, we obtain a generalization of a celebrated result by Friedman-Kahn-Szemeredi and Feige-Ofek on the spectrum of sparse random matrices.