Search Results for author: Dongwon Lee

Found 38 papers, 19 papers with code

Authorship Attribution for Neural Text Generation

no code implementations EMNLP 2020 Adaku Uchendu, Thai Le, Kai Shu, Dongwon Lee

In recent years, the task of generating realistic short and long texts have made tremendous advancements.

Authorship Attribution Text Generation

ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis

1 code implementation15 Apr 2024 Aashish Anantha Ramakrishnan, Sharon X. Huang, Dongwon Lee

With Large Language Models (LLM) achieving success in language and commonsense reasoning tasks, we explore the ability of different LLMs to identify and understand key subjects from abstractive captions.

Descriptive Image Captioning +2

Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations

no code implementations4 Apr 2024 Mahjabin Nahar, Haeseung Seo, Eun-Ju Lee, Aiping Xiong, Dongwon Lee

This research aims to understand the human perception of LLM hallucinations by systematically varying the degree of hallucination (genuine, minor hallucination, major hallucination) and examining its interaction with warning (i. e., a warning of potential inaccuracies: absent vs. present).

Hallucination Human Detection

ALISON: Fast and Effective Stylometric Authorship Obfuscation

1 code implementation1 Feb 2024 Eric Xing, Saranya Venkatraman, Thai Le, Dongwon Lee

AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its authorship.

Authorship Attribution

A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts

no code implementations14 Nov 2023 Nafis Irtiza Tripto, Saranya Venkatraman, Dominik Macko, Robert Moro, Ivan Srba, Adaku Uchendu, Thai Le, Dongwon Lee

In the realm of text manipulation and linguistic transformation, the question of authorship has always been a subject of fascination and philosophical inquiry.

Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation

1 code implementation24 Oct 2023 Jason Lucas, Adaku Uchendu, Michiharu Yamashita, Jooyoung Lee, Shaurya Rohatgi, Dongwon Lee

Recent ubiquity and disruptive impacts of large language models (LLMs) have raised concerns about their potential to be misused (. i. e, generating large-scale harmful and misleading content).

MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark

1 code implementation20 Oct 2023 Dominik Macko, Robert Moro, Adaku Uchendu, Jason Samuel Lucas, Michiharu Yamashita, Matúš Pikuliak, Ivan Srba, Thai Le, Dongwon Lee, Jakub Simko, Maria Bielikova

There is a lack of research into capabilities of recent LLMs to generate convincing text in languages other than English and into performance of detectors of machine-generated text in multilingual settings.

Benchmarking Text Detection

GPT-who: An Information Density-based Machine-Generated Text Detector

1 code implementation9 Oct 2023 Saranya Venkatraman, Adaku Uchendu, Dongwon Lee

We examine if this UID principle can help capture differences between Large Language Models (LLMs)-generated and human-generated texts.

Authorship Attribution

TOPFORMER: Topology-Aware Authorship Attribution of Deepfake Texts with Diverse Writing Styles

no code implementations22 Sep 2023 Adaku Uchendu, Thai Le, Dongwon Lee

We propose TopFormer to improve existing AA solutions by capturing more linguistic patterns in deepfake texts by including a Topological Data Analysis (TDA) layer in the Transformer-based model.

Authorship Attribution Face Swapping +3

Does Human Collaboration Enhance the Accuracy of Identifying LLM-Generated Deepfake Texts?

2 code implementations3 Apr 2023 Adaku Uchendu, Jooyoung Lee, Hua Shen, Thai Le, Ting-Hao 'Kenneth' Huang, Dongwon Lee

Advances in Large Language Models (e. g., GPT-4, LLaMA) have improved the generation of coherent sentences resembling human writing on a large scale, resulting in the creation of so-called deepfake texts.

Face Swapping Human Detection +1

NoisyHate: Benchmarking Content Moderation Machine Learning Models with Human-Written Perturbations Online

no code implementations18 Mar 2023 Yiran Ye, Thai Le, Dongwon Lee

In this paper, we introduce a benchmark test set containing human-written perturbations online for toxic speech detection models.

Adversarial Attack Benchmarking +1

Imputing Knowledge Tracing Data with Subject-Based Training via LSTM Variational Autoencoders Frameworks

no code implementations24 Feb 2023 Jia Tracy Shen, Dongwon Lee

The paper finally compare the model performance between training the original data and training the data imputed with generated data from non-subject based model VAE-NS and subject-based training models (i. e., VAE and LVAE).

Knowledge Tracing

CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild

no code implementations16 Jan 2023 Thai Le, Ye Yiran, Yifan Hu, Dongwon Lee

CRYPTEXT is a data-intensive application that provides the users with a database and several tools to extract and interact with human-written perturbations.

ANNA: Abstractive Text-to-Image Synthesis with Filtered News Captions

1 code implementation5 Jan 2023 Aashish Anantha Ramakrishnan, Sharon X. Huang, Dongwon Lee

Advancements in Text-to-Image synthesis over recent years have focused more on improving the quality of generated samples on datasets with descriptive captions.

Benchmarking Descriptive +2

Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective

no code implementations19 Oct 2022 Adaku Uchendu, Thai Le, Dongwon Lee

Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO).

Attribute Authorship Attribution +1

Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense

1 code implementation Findings (ACL) 2022 Thai Le, Jooyoung Lee, Kevin Yen, Yifan Hu, Dongwon Lee

We find that adversarial texts generated by ANTHRO achieve the best trade-off between (1) attack success rate, (2) semantic preservation of the original text, and (3) stealthiness--i. e. indistinguishable from human writings hence harder to be flagged as suspicious.

Adversarial Attack

Do Language Models Plagiarize?

1 code implementation15 Mar 2022 Jooyoung Lee, Thai Le, Jinghui Chen, Dongwon Lee

Our results suggest that (1) three types of plagiarism widely exist in LMs beyond memorization, (2) both size and decoding methods of LMs are strongly associated with the degrees of plagiarism they exhibit, and (3) fine-tuned LMs' plagiarism patterns vary based on their corpus similarity and homogeneity.

Language Modelling Memorization +1

JAMES: Normalizing Job Titles with Multi-Aspect Graph Embeddings and Reasoning

no code implementations22 Feb 2022 Michiharu Yamashita, Jia Tracy Shen, Thanh Tran, Hamoon Ekhtiari, Dongwon Lee

In online job marketplaces, it is important to establish a well-defined job title taxonomy for various downstream tasks (e. g., job recommendation, users' career analysis, and turnover prediction).

Logical Reasoning Semantic Similarity +1

Socialbots on Fire: Modeling Adversarial Behaviors of Socialbots via Multi-Agent Hierarchical Reinforcement Learning

no code implementations20 Oct 2021 Thai Le, Long Tran-Thanh, Dongwon Lee

To this question, we successfully demonstrate that indeed it is possible for adversaries to exploit computational learning mechanism such as reinforcement learning (RL) to maximize the influence of socialbots while avoiding being detected.

Adversarial Attack Hierarchical Reinforcement Learning +2

MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education

1 code implementation2 Jun 2021 Jia Tracy Shen, Michiharu Yamashita, Ethan Prihar, Neil Heffernan, Xintao Wu, Ben Graff, Dongwon Lee

Due to the nature of mathematical texts, which often use domain specific vocabulary along with equations and math symbols, we posit that the development of a new BERT model for mathematics would be useful for many mathematical downstream tasks.

Knowledge Tracing Language Modelling +2

Large-Scale Data-Driven Airline Market Influence Maximization

no code implementations31 May 2021 Duanshun Li, Jing Liu, Jinsung Jeon, Seoyoung Hong, Thai Le, Dongwon Lee, Noseong Park

On top of the prediction models, we define a budget-constrained flight frequency optimization problem to maximize the market influence over 2, 262 routes.

Classifying Math KCs via Task-Adaptive Pre-Trained BERT

no code implementations24 May 2021 Jia Tracy Shen, Michiharu Yamashita, Ethan Prihar, Neil Heffernan, Xintao Wu, Sean McGrew, Dongwon Lee

Educational content labeled with proper knowledge components (KCs) are particularly useful to teachers or content organizers.

Math Task 2

Detecting micro fractures: A comprehensive comparison of conventional and machine-learning based segmentation methods

no code implementations23 Mar 2021 Dongwon Lee, Nikolaos Karadimitriou, Matthias Ruf, Holger Steeb

The segmentation results from all five methods are compared to each other in terms of segmentation quality and time efficiency.

Segmentation

SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher

1 code implementation ACL 2022 Thai Le, Noseong Park, Dongwon Lee

Even though several methods have proposed to defend textual neural network (NN) models against black-box adversarial attacks, they often defend against a specific text perturbation strategy and/or require re-training the models from scratch.

Adversarial Robustness

Achieving User-Side Fairness in Contextual Bandits

no code implementations22 Oct 2020 Wen Huang, Kevin Labille, Xintao Wu, Dongwon Lee, Neil Heffernan

Personalized recommendation based on multi-arm bandit (MAB) algorithms has shown to lead to high utility and efficiency as it can dynamically adapt the recommendation strategy based on feedback.

Fairness Multi-Armed Bandits

MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models

1 code implementation1 Sep 2020 Thai Le, Suhang Wang, Dongwon Lee

In recent years, the proliferation of so-called "fake news" has caused much disruptions in society and weakened the news ecosystem.

Comment Generation Fake News Detection

CoAID: COVID-19 Healthcare Misinformation Dataset

2 code implementations22 May 2020 Limeng Cui, Dongwon Lee

As the COVID-19 virus quickly spreads around the world, unfortunately, misinformation related to COVID-19 also gets created and spreads like wild fire.

Misinformation

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements

1 code implementation2 Jan 2020 Kai Shu, Suhang Wang, Dongwon Lee, Huan Liu

In recent years, disinformation including fake news, has became a global phenomenon due to its explosive growth, particularly on social media.

Ethics Fact Checking

GRACE: Generating Concise and Informative Contrastive Sample to Explain Neural Network Model's Prediction

1 code implementation5 Nov 2019 Thai Le, Suhang Wang, Dongwon Lee

Despite the recent development in the topic of explainable AI/ML for image and text data, the majority of current solutions are not suitable to explain the prediction of neural network models when the datasets are tabular and their features are in high-dimensional vectorized formats.

Philosophy

Deep Reinforcement Learning for Personalized Search Story Recommendation

no code implementations26 Jul 2019 Jason, Zhang, Junming Yin, Dongwon Lee, Linhong Zhu

In recent years, \emph{search story}, a combined display with other organic channels, has become a major source of user traffic on platforms such as e-commerce search platforms, news feed platforms and web and image search platforms.

Image Retrieval Imitation Learning +2

FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media

7 code implementations5 Sep 2018 Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, Huan Liu

However, fake news detection is a non-trivial task, which requires multi-source information such as news content, social context, and dynamic information.

Social and Information Networks

Regularizing Matrix Factorization with User and Item Embeddings for Recommendation

2 code implementations31 Aug 2018 Thanh Tran, Kyumin Lee, Yiming Liao, Dongwon Lee

Following recent successes in exploiting both latent factor and word embedding models in recommendation, we propose a novel Regularized Multi-Embedding (RME) based recommendation model that simultaneously encapsulates the following ideas via decomposition: (1) which items a user likes, (2) which two users co-like the same items, (3) which two items users often co-liked, and (4) which two items users often co-disliked.

Cannot find the paper you are looking for? You can Submit a new open access paper.