no code implementations • 22 Dec 2024 • Jio Oh, Geon Heo, Seungjun Oh, Jindong Wang, Xing Xie, Steven Euijong Whang
Despite the recent advancement of Large Langauge Models (LLMs), they struggle with complex queries often involving multiple conditions, common in real-world scenarios.
no code implementations • 3 Oct 2024 • Soyeon Kim, Yuji Roh, Geon Heo, Steven Euijong Whang
Generative models must ensure both privacy and fairness for Trustworthy AI.
no code implementations • 2 Oct 2024 • Jaeyoung Park, Minsu Kim, Steven Euijong Whang
We then propose a fair class-incremental learning framework that adjusts the training weights of current task samples to change the direction of the average gradient vector and thus reduce the forgetting of underperforming groups and achieve fairness.
no code implementations • 28 May 2024 • Seong-Hyeon Hwang, Minsu Kim, Steven Euijong Whang
We study the problem of robust data augmentation for regression tasks in the presence of noisy data.
1 code implementation • 8 Mar 2024 • Jio Oh, Soyeon Kim, Junseok Seo, Jindong Wang, Ruochen Xu, Xing Xie, Steven Euijong Whang
Unlike knowledge graphs, which are also used to evaluate LLMs, relational databases have integrity constraints that can be used to better construct complex in-depth questions and verify answers: (1) functional dependencies can be used to pinpoint critical keywords that an LLM must know to properly answer a given question containing certain attribute values; and (2) foreign key constraints can be used to join relations and construct multi-hop questions, which can be arbitrarily long and used to debug intermediate answers.
no code implementations • 7 Feb 2024 • Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao
By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks.
1 code implementation • 23 Jan 2024 • Ki Hyun Tae, Hantian Zhang, Jaeyoung Park, Kexin Rong, Steven Euijong Whang
Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e. g., (attribute=female, label=positive)) that are the most informative for improving fairness.
no code implementations • 15 Dec 2023 • Minsu Kim, Seong-Hyeon Hwang, Steven Euijong Whang
However, we contend that explicitly utilizing the drifted data together leads to much better model accuracy and propose Quilt, a data-centric framework for identifying and selecting data segments that maximize model accuracy.
no code implementations • 24 May 2023 • Geon Heo, Junseok Seo, Steven Euijong Whang
Personalized privacy becomes critical in deep learning for Trustworthy AI.
no code implementations • 5 Feb 2023 • Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh
First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness.
no code implementations • 22 Sep 2022 • Hyunseung Hwang, Steven Euijong Whang
We study the problem of explainability-first clustering where explainability becomes a first-class citizen for clustering.
1 code implementation • 15 Sep 2022 • Hantian Zhang, Ki Hyun Tae, Jaeyoung Park, Xu Chu, Steven Euijong Whang
We then propose an approximate linear programming algorithm and provide theoretical guarantees on how close its result is to the optimal solution in terms of the number of label flips.
no code implementations • 7 Feb 2022 • Geon Heo, Steven Euijong Whang
Our experiments show that a probabilistic decision boundary can be a good proxy for labelers, and that our approach is effective in defending against inference attacks and can scale to large data.
no code implementations • 13 Dec 2021 • Steven Euijong Whang, Yuji Roh, Hwanjun Song, Jae-Gil Lee
In this survey, we study the research landscape for data collection and data quality primarily for deep learning applications.
no code implementations • NeurIPS 2021 • Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh
In this work, we propose a sample selection-based algorithm for fair and robust training.
no code implementations • 7 Jun 2021 • Seong-Hyeon Hwang, Steven Euijong Whang
Data augmentation is becoming essential for improving regression performance in critical applications including manufacturing, climate prediction, and finance.
no code implementations • 15 Jan 2021 • Steven Euijong Whang, Ki Hyun Tae, Yuji Roh, Geon Heo
Second, responsible AI must be broadly supported, preferably in all steps of machine learning.
1 code implementation • ICLR 2021 • Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh
We address this problem via the lens of bilevel optimization.
no code implementations • 7 Apr 2020 • Geon Heo, Yuji Roh, Seonghyeon Hwang, Dayun Lee, Steven Euijong Whang
We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification.
2 code implementations • 10 Mar 2020 • Ki Hyun Tae, Steven Euijong Whang
Instead, we contend that one needs to selectively acquire data and propose Slice Tuner, which acquires possibly-different amounts of data per slice such that the model accuracy and fairness on all slices are optimized.
1 code implementation • ICML 2020 • Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh
Trustworthy AI is a critical issue in machine learning where, in addition to training a model that is accurate, one must consider both fair and robust training in the presence of data bias and poisoning.
no code implementations • 25 Sep 2019 • Yuji Roh, Kangwook Lee, Gyeong Jo Hwang, Steven Euijong Whang, Changho Suh
We consider the problem of fair and robust model training in the presence of data poisoning.
no code implementations • 22 Apr 2019 • Ki Hyun Tae, Yuji Roh, Young Hun Oh, Hyunsu Kim, Steven Euijong Whang
As machine learning is used in sensitive applications, it becomes imperative that the trained model is accurate, fair, and robust to attacks.
no code implementations • 8 Nov 2018 • Yuji Roh, Geon Heo, Steven Euijong Whang
Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data.
no code implementations • 16 Jul 2018 • Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, Steven Euijong Whang
As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models.