1 code implementation • 10 Jan 2025 • Taywon Min, Haeone Lee, Hanho Ryu, Yongchan Kwon, Kimin Lee
By quantifying the impact of human feedback on reward models, we believe that influence functions can enhance feedback interpretability and contribute to scalable oversight in RLHF, helping labelers provide more accurate and consistent feedback.
no code implementations • 9 Nov 2024 • Jiayao Zhang, Yuran Bi, Mengye Cheng, Jinfei Liu, Kui Ren, Qiheng Sun, Yihang Wu, Yang Cao, Raul Castro Fernandez, Haifeng Xu, Ruoxi Jia, Yongchan Kwon, Jian Pei, Jiachen T. Wang, Haocheng Xia, Li Xiong, Xiaohui Yu, James Zou
Data is the new oil of the 21st century.
no code implementations • 21 Oct 2024 • Zhaonan Qu, Yongchan Kwon
Viewing common issues in IV estimation as distributional uncertainties, we propose DRIVE, a distributionally robust IV estimation method.
no code implementations • 9 Oct 2024 • Yongchan Kwon, Sokbae Lee, Guillaume A. Pouliot
We propose a variant of the Shapley value, the group Shapley value, to interpret counterfactual simulations in structural economic models by quantifying the importance of different components.
2 code implementations • 7 Aug 2024 • Yifan Sun, Jingyan Shen, Yongchan Kwon
Data valuation has emerged as a powerful framework for quantifying each datum's contribution to the training of a machine learning model.
1 code implementation • 21 Jul 2024 • Yizi Zhang, Jingyan Shen, Xiaoxue Xiong, Yongchan Kwon
Evaluating the contribution of individual data points to a model's prediction is critical for interpreting model predictions and improving model performance.
no code implementations • 28 May 2024 • Shuran Zheng, Xuan Qi, Rui Ray Chen, Yongchan Kwon, James Zou
Data plays a central role in the development of modern artificial intelligence, with high-quality data emerging as a key driver of model performance.
no code implementations • 6 May 2024 • Jiachen T. Wang, Tianji Yang, James Zou, Yongchan Kwon, Ruoxi Jia
Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research.
no code implementations • 22 Nov 2023 • Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou
As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative.
1 code implementation • 2 Oct 2023 • Yongchan Kwon, Eric Wu, Kevin Wu, James Zou
Quantifying the impact of training data points is crucial for understanding the outputs of machine learning models and for improving the transparency of the AI pipeline.
2 code implementations • NeurIPS 2023 • Kevin Fu Jiang, Weixin Liang, James Zou, Yongchan Kwon
Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset.
1 code implementation • 4 May 2023 • Weixin Liang, Yining Mao, Yongchan Kwon, Xinyu Yang, James Zou
Our work highlights the importance of understanding the nonlinear effects of model improvement on performance in different subpopulations, and has the potential to inform the development of more equitable and responsible machine learning models.
2 code implementations • 16 Apr 2023 • Yongchan Kwon, James Zou
As a result, it has been recognized as infeasible to apply to large datasets.
1 code implementation • 27 Sep 2022 • Yongchan Kwon, James Zou
On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions compared to the features identified by the Shapley value.
2 code implementations • 3 Mar 2022 • Weixin Liang, Yuhui Zhang, Yongchan Kwon, Serena Yeung, James Zou
Our systematic analysis demonstrates that this gap is caused by a combination of model initialization and contrastive learning optimization.
no code implementations • 26 Jan 2022 • Yongchan Kwon, Antonio Ginart, James Zou
We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users.
2 code implementations • 26 Oct 2021 • Yongchan Kwon, James Zou
Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.
no code implementations • 15 Sep 2020 • Antonio Ginart, Eva Zhang, Yongchan Kwon, James Zou
A service that is more often queried by users, perhaps because it more accurately anticipates user preferences, is also more likely to obtain additional user data (e. g. in the form of a Yelp review).
no code implementations • 2 Jul 2020 • Yongchan Kwon, Manuel A. Rivas, James Zou
Distributional data Shapley value (DShapley) has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.
1 code implementation • ICML 2020 • Yongchan Kwon, Wonyoung Kim, Joong-Ho Won, Myunghee Cho Paik
We show that our approximation and risk consistency results naturally extend to the cases when data are locally perturbed.
no code implementations • 20 Mar 2019 • Seongok Ryu, Yongchan Kwon, Woo Youn Kim
Deep neural networks have outperformed existing machine learning models in various molecular applications.
1 code implementation • 28 Jan 2019 • Yongchan Kwon, Wonyoung Kim, Masashi Sugiyama, Myunghee Cho Paik
We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning).
no code implementations • 19 Nov 2018 • Seongok Ryu, Yongchan Kwon, Woo Youn Kim
In chemistry, deep neural network models have been increasingly utilized in a variety of applications such as molecular property predictions, novel molecule designs, and planning chemical reactions.
no code implementations • MIDL 2018 Conference 2018 • Yongchan Kwon, Joong-Ho Won, Beom Joon Kim, Myunghee Cho Paik
Most recent research of neural networks in the field of computer vision has focused on improving accuracy of point predictions by developing various network architectures or learning algorithms.
General Classification
Ischemic Stroke Lesion Segmentation
+2