Search Results for author: Steven Euijong Whang

Found 21 papers, 6 papers with code

ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models

1 code implementation • 8 Mar 2024 • Jio Oh, Soyeon Kim, Junseok Seo, Jindong Wang, Ruochen Xu, Xing Xie, Steven Euijong Whang

Our key idea is to construct questions using the database schema, records, and functional dependencies such that they can be automatically verified.

Hallucination Prompt Engineering

Paper
Code

LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

no code implementations • 7 Feb 2024 • Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks.

Paper
Add Code

Falcon: Fair Active Learning using Multi-armed Bandits

1 code implementation • 23 Jan 2024 • Ki Hyun Tae, Hantian Zhang, Jaeyoung Park, Kexin Rong, Steven Euijong Whang

Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e. g., (attribute=female, label=positive)) that are the most informative for improving fairness.

Active Learning Attribute +4

Paper
Code

Quilt: Robust Data Segment Selection against Concept Drifts

no code implementations • 15 Dec 2023 • Minsu Kim, Seong-Hyeon Hwang, Steven Euijong Whang

However, we contend that explicitly utilizing the drifted data together leads to much better model accuracy and propose Quilt, a data-centric framework for identifying and selecting data segments that maximize model accuracy.

Paper
Add Code

Personalized DP-SGD using Sampling Mechanisms

no code implementations • 24 May 2023 • Geon Heo, Junseok Seo, Steven Euijong Whang

Personalized privacy becomes critical in deep learning for Trustworthy AI.

Paper
Add Code

Improving Fair Training under Correlation Shifts

no code implementations • 5 Feb 2023 • Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh

First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness.

Fairness

Paper
Add Code

XClusters: Explainability-first Clustering

no code implementations • 22 Sep 2022 • Hyunseung Hwang, Steven Euijong Whang

We study the problem of explainability-first clustering where explainability becomes a first-class citizen for clustering.

Clustering

Paper
Add Code

iFlipper: Label Flipping for Individual Fairness

1 code implementation • 15 Sep 2022 • Hantian Zhang, Ki Hyun Tae, Jaeyoung Park, Xu Chu, Steven Euijong Whang

We then propose an approximate linear programming algorithm and provide theoretical guarantees on how close its result is to the optimal solution in terms of the number of label flips.

Fairness

Paper
Code

Redactor: A Data-centric and Individualized Defense Against Inference Attacks

no code implementations • 7 Feb 2022 • Geon Heo, Steven Euijong Whang

Our experiments show that a probabilistic decision boundary can be a good proxy for labelers, and that our approach is effective in defending against inference attacks and can scale to large data.

Data Poisoning

Paper
Add Code

Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective

no code implementations • 13 Dec 2021 • Steven Euijong Whang, Yuji Roh, Hwanjun Song, Jae-Gil Lee

In this survey, we study the research landscape for data collection and data quality primarily for deep learning applications.

BIG-bench Machine Learning Fairness +2

Paper
Add Code

Sample Selection for Fair and Robust Training

no code implementations • NeurIPS 2021 • Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh

In this work, we propose a sample selection-based algorithm for fair and robust training.

Combinatorial Optimization Fairness

Paper
Add Code

RegMix: Data Mixing Augmentation for Regression

no code implementations • 7 Jun 2021 • Seong-Hyeon Hwang, Steven Euijong Whang

Data augmentation is becoming essential for improving regression performance in critical applications including manufacturing, climate prediction, and finance.

Classification Data Augmentation +2

Paper
Add Code

Responsible AI Challenges in End-to-end Machine Learning

no code implementations • 15 Jan 2021 • Steven Euijong Whang, Ki Hyun Tae, Yuji Roh, Geon Heo

Second, responsible AI must be broadly supported, preferably in all steps of machine learning.

BIG-bench Machine Learning Fairness +1

Paper
Add Code

FairBatch: Batch Selection for Model Fairness

1 code implementation • ICLR 2021 • Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh

We address this problem via the lens of bilevel optimization.

BIG-bench Machine Learning Bilevel Optimization +1

Paper
Code

Inspector Gadget: A Data Programming-based Labeling System for Industrial Images

no code implementations • 7 Apr 2020 • Geon Heo, Yuji Roh, Seonghyeon Hwang, Dayun Lee, Steven Euijong Whang

We propose Inspector Gadget, an image labeling system that combines crowdsourcing, data augmentation, and data programming to produce weak labels at scale for image classification.

BIG-bench Machine Learning Data Augmentation +2

Paper
Add Code

Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models

2 code implementations • 10 Mar 2020 • Ki Hyun Tae, Steven Euijong Whang

Instead, we contend that one needs to selectively acquire data and propose Slice Tuner, which acquires possibly-different amounts of data per slice such that the model accuracy and fairness on all slices are optimized.

Active Learning BIG-bench Machine Learning +1

Paper
Code

FR-Train: A Mutual Information-Based Approach to Fair and Robust Training

1 code implementation • ICML 2020 • Yuji Roh, Kangwook Lee, Steven Euijong Whang, Changho Suh

Trustworthy AI is a critical issue in machine learning where, in addition to training a model that is accurate, one must consider both fair and robust training in the presence of data bias and poisoning.

Data Poisoning Fairness

Paper
Code

FR-GAN: Fair and Robust Training

no code implementations • 25 Sep 2019 • Yuji Roh, Kangwook Lee, Gyeong Jo Hwang, Steven Euijong Whang, Changho Suh

We consider the problem of fair and robust model training in the presence of data poisoning.

Attribute Data Poisoning +1

Paper
Add Code

Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach

no code implementations • 22 Apr 2019 • Ki Hyun Tae, Yuji Roh, Young Hun Oh, Hyunsu Kim, Steven Euijong Whang

As machine learning is used in sensitive applications, it becomes imperative that the trained model is accurate, fair, and robust to attacks.

BIG-bench Machine Learning Fairness +1

Paper
Add Code

A Survey on Data Collection for Machine Learning: a Big Data -- AI Integration Perspective

no code implementations • 8 Nov 2018 • Yuji Roh, Geon Heo, Steven Euijong Whang

Interestingly, recent research in data collection comes not only from the machine learning, natural language, and computer vision communities, but also from the data management community due to the importance of handling large amounts of data.

BIG-bench Machine Learning Feature Engineering +1

Paper
Add Code

Automated Data Slicing for Model Validation:A Big data - AI Integration Approach

no code implementations • 16 Jul 2018 • Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, Steven Euijong Whang

As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models.

Clustering Fairness +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.