Search Results for author: Steven Euijong Whang

Found 21 papers, 6 papers with code

ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models

1 code implementation8 Mar 2024 Jio Oh, Soyeon Kim, Junseok Seo, Jindong Wang, Ruochen Xu, Xing Xie, Steven Euijong Whang

Our key idea is to construct questions using the database schema, records, and functional dependencies such that they can be automatically verified.

Hallucination Prompt Engineering

LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

no code implementations7 Feb 2024 Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks.

Falcon: Fair Active Learning using Multi-armed Bandits

1 code implementation23 Jan 2024 Ki Hyun Tae, Hantian Zhang, Jaeyoung Park, Kexin Rong, Steven Euijong Whang

Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e. g., (attribute=female, label=positive)) that are the most informative for improving fairness.

Active Learning Attribute +4

Quilt: Robust Data Segment Selection against Concept Drifts

no code implementations15 Dec 2023 Minsu Kim, Seong-Hyeon Hwang, Steven Euijong Whang

However, we contend that explicitly utilizing the drifted data together leads to much better model accuracy and propose Quilt, a data-centric framework for identifying and selecting data segments that maximize model accuracy.

Personalized DP-SGD using Sampling Mechanisms

no code implementations24 May 2023 Geon Heo, Junseok Seo, Steven Euijong Whang

Personalized privacy becomes critical in deep learning for Trustworthy AI.

Improving Fair Training under Correlation Shifts

no code implementations5 Feb 2023 Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh

First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness.

Fairness

XClusters: Explainability-first Clustering

no code implementations22 Sep 2022 Hyunseung Hwang, Steven Euijong Whang

We study the problem of explainability-first clustering where explainability becomes a first-class citizen for clustering.

Clustering

iFlipper: Label Flipping for Individual Fairness

1 code implementation15 Sep 2022 Hantian Zhang, Ki Hyun Tae, Jaeyoung Park, Xu Chu, Steven Euijong Whang

We then propose an approximate linear programming algorithm and provide theoretical guarantees on how close its result is to the optimal solution in terms of the number of label flips.

Fairness

Redactor: A Data-centric and Individualized Defense Against Inference Attacks

no code implementations7 Feb 2022 Geon Heo, Steven Euijong Whang

Our experiments show that a probabilistic decision boundary can be a good proxy for labelers, and that our approach is effective in defending against inference attacks and can scale to large data.

Data Poisoning

Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective

no code implementations13 Dec 2021 Steven Euijong Whang, Yuji Roh, Hwanjun Song, Jae-Gil Lee

In this survey, we study the research landscape for data collection and data quality primarily for deep learning applications.

BIG-bench Machine Learning Fairness +2

RegMix: Data Mixing Augmentation for Regression

no code implementations7 Jun 2021 Seong-Hyeon Hwang, Steven Euijong Whang

Data augmentation is becoming essential for improving regression performance in critical applications including manufacturing, climate prediction, and finance.

Classification Data Augmentation +2

Inspector Gadget: A Data Programming-based Labeling System for Industrial Images

no code implementations7 Apr 2020 Geon Heo, Yuji Roh, Seonghyeon Hwang, Dayun Lee, Steven Euijong Whang

We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification.

BIG-bench Machine Learning Data Augmentation +2

Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models

2 code implementations10 Mar 2020 Ki Hyun Tae, Steven Euijong Whang

Instead, we contend that one needs to selectively acquire data and propose Slice Tuner, which acquires possibly-different amounts of data per slice such that the model accuracy and fairness on all slices are optimized.

Active Learning BIG-bench Machine Learning +1

FR-Train: A Mutual Information-Based Approach to Fair and Robust Training

1 code implementation ICML 2020 Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh

Trustworthy AI is a critical issue in machine learning where, in addition to training a model that is accurate, one must consider both fair and robust training in the presence of data bias and poisoning.

Data Poisoning Fairness

FR-GAN: Fair and Robust Training

no code implementations25 Sep 2019 Yuji Roh, Kangwook Lee, Gyeong Jo Hwang, Steven Euijong Whang, Changho Suh

We consider the problem of fair and robust model training in the presence of data poisoning.

Attribute Data Poisoning +1

Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach

no code implementations22 Apr 2019 Ki Hyun Tae, Yuji Roh, Young Hun Oh, Hyunsu Kim, Steven Euijong Whang

As machine learning is used in sensitive applications, it becomes imperative that the trained model is accurate, fair, and robust to attacks.

BIG-bench Machine Learning Fairness +1

A Survey on Data Collection for Machine Learning: a Big Data -- AI Integration Perspective

no code implementations8 Nov 2018 Yuji Roh, Geon Heo, Steven Euijong Whang

Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data.

BIG-bench Machine Learning Feature Engineering +1

Automated Data Slicing for Model Validation:A Big data - AI Integration Approach

no code implementations16 Jul 2018 Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, Steven Euijong Whang

As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models.

Clustering Fairness +1

Cannot find the paper you are looking for? You can Submit a new open access paper.