Search Results for author: Steven Euijong Whang

Found 25 papers, 6 papers with code

Better Think with Tables: Leveraging Tables to Enhance Large Language Model Comprehension

no code implementations22 Dec 2024 Jio Oh, Geon Heo, Seungjun Oh, Jindong Wang, Xing Xie, Steven Euijong Whang

Despite the recent advancement of Large Langauge Models (LLMs), they struggle with complex queries often involving multiple conditions, common in real-world scenarios.

Language Modeling Language Modelling +1

Fair Class-Incremental Learning using Sample Weighting

no code implementations2 Oct 2024 Jaeyoung Park, Minsu Kim, Steven Euijong Whang

We then propose a fair class-incremental learning framework that adjusts the training weights of current task samples to change the direction of the average gradient vector and thus reduce the forgetting of underperforming groups and achieve fairness.

class-incremental learning Class Incremental Learning +2

ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models

1 code implementation8 Mar 2024 Jio Oh, Soyeon Kim, Junseok Seo, Jindong Wang, Ruochen Xu, Xing Xie, Steven Euijong Whang

Unlike knowledge graphs, which are also used to evaluate LLMs, relational databases have integrity constraints that can be used to better construct complex in-depth questions and verify answers: (1) functional dependencies can be used to pinpoint critical keywords that an LLM must know to properly answer a given question containing certain attribute values; and (2) foreign key constraints can be used to join relations and construct multi-hop questions, which can be arbitrarily long and used to debug intermediate answers.

Attribute Hallucination +2

LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

no code implementations7 Feb 2024 Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks.

Falcon: Fair Active Learning using Multi-armed Bandits

1 code implementation23 Jan 2024 Ki Hyun Tae, Hantian Zhang, Jaeyoung Park, Kexin Rong, Steven Euijong Whang

Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e. g., (attribute=female, label=positive)) that are the most informative for improving fairness.

Active Learning Attribute +4

Quilt: Robust Data Segment Selection against Concept Drifts

no code implementations15 Dec 2023 Minsu Kim, Seong-Hyeon Hwang, Steven Euijong Whang

However, we contend that explicitly utilizing the drifted data together leads to much better model accuracy and propose Quilt, a data-centric framework for identifying and selecting data segments that maximize model accuracy.

Personalized DP-SGD using Sampling Mechanisms

no code implementations24 May 2023 Geon Heo, Junseok Seo, Steven Euijong Whang

Personalized privacy becomes critical in deep learning for Trustworthy AI.

Improving Fair Training under Correlation Shifts

no code implementations5 Feb 2023 Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh

First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness.

Fairness

XClusters: Explainability-first Clustering

no code implementations22 Sep 2022 Hyunseung Hwang, Steven Euijong Whang

We study the problem of explainability-first clustering where explainability becomes a first-class citizen for clustering.

Clustering

iFlipper: Label Flipping for Individual Fairness

1 code implementation15 Sep 2022 Hantian Zhang, Ki Hyun Tae, Jaeyoung Park, Xu Chu, Steven Euijong Whang

We then propose an approximate linear programming algorithm and provide theoretical guarantees on how close its result is to the optimal solution in terms of the number of label flips.

Fairness

Redactor: A Data-centric and Individualized Defense Against Inference Attacks

no code implementations7 Feb 2022 Geon Heo, Steven Euijong Whang

Our experiments show that a probabilistic decision boundary can be a good proxy for labelers, and that our approach is effective in defending against inference attacks and can scale to large data.

Data Poisoning

Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective

no code implementations13 Dec 2021 Steven Euijong Whang, Yuji Roh, Hwanjun Song, Jae-Gil Lee

In this survey, we study the research landscape for data collection and data quality primarily for deep learning applications.

BIG-bench Machine Learning Fairness +2

RegMix: Data Mixing Augmentation for Regression

no code implementations7 Jun 2021 Seong-Hyeon Hwang, Steven Euijong Whang

Data augmentation is becoming essential for improving regression performance in critical applications including manufacturing, climate prediction, and finance.

Classification Data Augmentation +2

Inspector Gadget: A Data Programming-based Labeling System for Industrial Images

no code implementations7 Apr 2020 Geon Heo, Yuji Roh, Seonghyeon Hwang, Dayun Lee, Steven Euijong Whang

We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification.

BIG-bench Machine Learning Data Augmentation +2

Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models

2 code implementations10 Mar 2020 Ki Hyun Tae, Steven Euijong Whang

Instead, we contend that one needs to selectively acquire data and propose Slice Tuner, which acquires possibly-different amounts of data per slice such that the model accuracy and fairness on all slices are optimized.

Active Learning BIG-bench Machine Learning +1

FR-Train: A Mutual Information-Based Approach to Fair and Robust Training

1 code implementation ICML 2020 Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh

Trustworthy AI is a critical issue in machine learning where, in addition to training a model that is accurate, one must consider both fair and robust training in the presence of data bias and poisoning.

Data Poisoning Fairness

FR-GAN: Fair and Robust Training

no code implementations25 Sep 2019 Yuji Roh, Kangwook Lee, Gyeong Jo Hwang, Steven Euijong Whang, Changho Suh

We consider the problem of fair and robust model training in the presence of data poisoning.

Attribute Data Poisoning +1

Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach

no code implementations22 Apr 2019 Ki Hyun Tae, Yuji Roh, Young Hun Oh, Hyunsu Kim, Steven Euijong Whang

As machine learning is used in sensitive applications, it becomes imperative that the trained model is accurate, fair, and robust to attacks.

BIG-bench Machine Learning Fairness +1

A Survey on Data Collection for Machine Learning: a Big Data -- AI Integration Perspective

no code implementations8 Nov 2018 Yuji Roh, Geon Heo, Steven Euijong Whang

Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data.

BIG-bench Machine Learning Feature Engineering +1

Automated Data Slicing for Model Validation:A Big data - AI Integration Approach

no code implementations16 Jul 2018 Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, Steven Euijong Whang

As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models.

Clustering Fairness +1

Cannot find the paper you are looking for? You can Submit a new open access paper.