Search Results for author: Seunghyun Yoon

Found 49 papers, 23 papers with code

Multimodal Speech Emotion Recognition Using Audio and Text

4 code implementations10 Oct 2018 Seunghyun Yoon, Seokhyun Byun, Kyomin Jung

Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers.

Emotion Classification Multimodal Emotion Recognition +2

Fine-grained Image Captioning with CLIP Reward

1 code implementation Findings (NAACL) 2022 Jaemin Cho, Seunghyun Yoon, Ajinkya Kale, Franck Dernoncourt, Trung Bui, Mohit Bansal

Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.

Caption Generation Descriptive +5

Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning

1 code implementation ACL 2020 Joongbo Shin, Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung

Even though BERT achieves successful performance improvements in various supervised learning tasks, applying BERT for unsupervised tasks still holds a limitation that it requires repetitive inference for computing contextual language representations.

Language Modelling Semantic Similarity +1

Simple Questions Generate Named Entity Recognition Datasets

1 code implementation16 Dec 2021 Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang

Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities.

Few-shot NER Named Entity Recognition +1

Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder

2 code implementations17 Nov 2018 Seunghyun Yoon, Kunwoo Park, Joongbo Shin, Hongjun Lim, Seungpil Won, Meeyoung Cha, Kyomin Jung

Some news headlines mislead readers with overrated or false information, and identifying them in advance will better assist readers in choosing proper news stories to consume.

Data Augmentation Fake News Detection +2

Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering

3 code implementations NAACL 2018 Seunghyun Yoon, Joongbo Shin, Kyomin Jung

In this paper, we propose a novel end-to-end neural architecture for ranking candidate answers, that adapts a hierarchical recurrent neural network and a latent topic clustering module.

Answer Selection Clustering +1

Attentive Modality Hopping Mechanism for Speech Emotion Recognition

1 code implementation29 Nov 2019 Seunghyun Yoon, Subhadeep Dey, Hwanhee Lee, Kyomin Jung

In this work, we explore the impact of visual modality in addition to speech and text for improving the accuracy of the emotion detection system.

Emotion Classification Multimodal Emotion Recognition +1

UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning

1 code implementation ACL 2021 Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Kyomin Jung

Also, we observe critical problems of the previous benchmark dataset (i. e., human annotations) on image captioning metric, and introduce a new collection of human annotations on the generated captions.

Contrastive Learning Image Captioning +1

Moment Detection in Long Tutorial Videos

1 code implementation ICCV 2023 Ioana Croitoru, Simion-Vlad Bogolin, Samuel Albanie, Yang Liu, Zhaowen Wang, Seunghyun Yoon, Franck Dernoncourt, Hailin Jin, Trung Bui

To study this problem, we propose the first dataset of untrimmed, long-form tutorial videos for the task of Moment Detection called the Behance Moment Detection (BMD) dataset.

How does fake news use a thumbnail? CLIP-based Multimodal Detection on the Unrepresentative News Image

1 code implementation CONSTRAINT (ACL) 2022 Hyewon Choi, Yejun Yoon, Seunghyun Yoon, Kunwoo Park

This study investigates how fake news uses a thumbnail for a news article with a focus on whether a news article's thumbnail represents the news content correctly.

Misinformation

CAISE: Conversational Agent for Image Search and Editing

1 code implementation24 Feb 2022 Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal

To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests.

Image Retrieval

PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search

1 code implementation19 Jul 2022 Thang M. Pham, Seunghyun Yoon, Trung Bui, Anh Nguyen

While contextualized word embeddings have been a de-facto standard, learning contextualized phrase embeddings is less explored and being hindered by the lack of a human-annotated benchmark that tests machine understanding of phrase semantics given a context sentence or paragraph (instead of phrases alone).

Information Retrieval Natural Language Understanding +5

PEEB: Part-based Image Classifiers with an Explainable and Editable Language Bottleneck

1 code implementation8 Mar 2024 Thang M. Pham, Peijie Chen, Tin Nguyen, Seunghyun Yoon, Trung Bui, Anh Totti Nguyen

CLIP-based classifiers rely on the prompt containing a {class name} that is known to the text encoder.

Propagate-Selector: Detecting Supporting Sentences for Question Answering via Graph Neural Networks

1 code implementation LREC 2020 Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung

In this study, we propose a novel graph neural network called propagate-selector (PS), which propagates information over sentences to understand information that cannot be inferred when considering sentences in isolation.

Answer Selection Sentence

Virtual Knowledge Graph Construction for Zero-Shot Domain-Specific Document Retrieval

1 code implementation COLING 2022 Yeon Seonwoo, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Alice Oh

We conduct three experiments 1) domain-specific document retrieval, 2) comparison of our virtual knowledge graph construction method with previous approaches, and 3) ablation study on each component of our virtual knowledge graph.

Domain Adaptation graph construction +2

Medical Question Understanding and Answering with Knowledge Grounding and Semantic Self-Supervision

1 code implementation COLING 2022 Khalil Mrini, Harpreet Singh, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Emilia Farcas, Ndapa Nakashole

The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.

Question Answering Retrieval

Efficient Transfer Learning Schemes for Personalized Language Modeling using Recurrent Neural Network

no code implementations13 Jan 2017 Seunghyun Yoon, Hyeongu Yun, Yuna Kim, Gyu-tae Park, Kyomin Jung

In this paper, we propose an efficient transfer leaning methods for training a personalized language model using a recurrent neural network with long short-term memory architecture.

Language Modelling Transfer Learning

A Compare-Aggregate Model with Latent Clustering for Answer Selection

no code implementations30 May 2019 Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung

In this paper, we propose a novel method for a sentence-level answer-selection task that is a fundamental problem in natural language processing.

Answer Selection Clustering +3

BaitWatcher: A lightweight web interface for the detection of incongruent news headlines

no code implementations23 Mar 2020 Kunwoo Park, Taegyun Kim, Seunghyun Yoon, Meeyoung Cha, Kyomin Jung

In digital environments where substantial amounts of information are shared online, news headlines play essential roles in the selection and diffusion of news articles.

Misinformation

DSTC8-AVSD: Multimodal Semantic Transformer Network with Retrieval Style Word Generator

no code implementations1 Apr 2020 Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung

Audio Visual Scene-aware Dialog (AVSD) is the task of generating a response for a question with a given scene, video, audio, and the history of previous turns in the dialog.

Retrieval Word Embeddings

Collaborative Training of GANs in Continuous and Discrete Spaces for Text Generation

no code implementations16 Oct 2020 Yanghoon Kim, Seungpil Won, Seunghyun Yoon, Kyomin Jung

Applying generative adversarial networks (GANs) to text-related tasks is challenging due to the discrete nature of language.

Reinforcement Learning (RL) Text Generation

MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain Acronym Extraction

no code implementations COLING 2022 Amir Pouran Ben Veyseh, Nicole Meister, Seunghyun Yoon, Rajiv Jain, Franck Dernoncourt, Thien Huu Nguyen

Acronym extraction is the task of identifying acronyms and their expanded forms in texts that is necessary for various NLP applications.

Multimodal Intent Discovery from Livestream Videos

no code implementations Findings (NAACL) 2022 Adyasha Maharana, Quan Tran, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Mohit Bansal

We construct and present a new multimodal dataset consisting of software instructional livestreams and containing manual annotations for both detailed and abstract procedural intent that enable training and evaluation of joint video and text understanding models.

Intent Discovery Video Summarization +1

Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text

no code implementations26 Jul 2022 Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung

Then, the attention weights of each modality are applied directly to the other modality in a crossed way, so that the CAN gathers the audio and text information from the same time steps based on each modality.

Speech Emotion Recognition

Offensive Content Detection via Synthetic Code-Switched Text

no code implementations COLING 2022 Cesa Salaam, Franck Dernoncourt, Trung Bui, Danda Rawat, Seunghyun Yoon

The prevalent use of offensive content in social media has become an important reason for concern for online platforms (customer service chat-boxes, social media platforms, etc).

PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning

no code implementations15 Mar 2023 Yongil Kim, Yerin Hwang, Hyeongu Yun, Seunghyun Yoon, Trung Bui, Kyomin Jung

Vulnerability to lexical perturbation is a critical weakness of automatic evaluation metrics for image captioning.

Image Captioning

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

no code implementations24 Jul 2023 Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen

Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Multilingual Sentence-Level Semantic Search using Meta-Distillation Learning

no code implementations15 Sep 2023 Meryem M'hamdi, Jonathan May, Franck Dernoncourt, Trung Bui, Seunghyun Yoon

Our approach leverages meta-distillation learning based on MAML, an optimization-based Model-Agnostic Meta-Learner.

Sentence

Is it Really Negative? Evaluating Natural Language Video Localization Performance on Multiple Reliable Videos Pool

no code implementations15 Aug 2023 Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung

With the explosion of multimedia content in recent years, Video Corpus Moment Retrieval (VCMR), which aims to detect a video moment that matches a given natural language query from multiple videos, has become a critical problem.

Contrastive Learning Moment Retrieval +3

Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

no code implementations30 Nov 2023 Linzi Xing, Quan Tran, Fabian Caba, Franck Dernoncourt, Seunghyun Yoon, Zhaowen Wang, Trung Bui, Giuseppe Carenini

Video topic segmentation unveils the coarse-grained semantic structure underlying videos and is essential for other video understanding tasks.

Contrastive Learning Segmentation +2

Understanding News Thumbnail Representativeness by Counterfactual Text-Guided Contrastive Language-Image Pretraining

no code implementations17 Feb 2024 Yejun Yoon, Seunghyun Yoon, Kunwoo Park

To serve the challenge, we introduce NewsTT, a manually annotated dataset of news thumbnail image and text pairs.

counterfactual

Fine-tuning CLIP Text Encoders with Two-step Paraphrasing

no code implementations23 Feb 2024 Hyunjae Kim, Seunghyun Yoon, Trung Bui, Handong Zhao, Quan Tran, Franck Dernoncourt, Jaewoo Kang

Contrastive language-image pre-training (CLIP) models have demonstrated considerable success across various vision-language tasks, such as text-to-image retrieval, where the model is required to effectively process natural language input to produce an accurate visual output.

Image Captioning Image Retrieval +3

Scaling Up Video Summarization Pretraining with Large Language Models

no code implementations4 Apr 2024 Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung

Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem.

Video Alignment Video Summarization

FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document

no code implementations17 Apr 2024 Joonho Yang, Seunghyun Yoon, Byeongjeong Kim, Hwanhee Lee

These atomic facts represent a more fine-grained unit of information, facilitating detailed understanding and interpretability of the summary's factual inconsistency.

Abstractive Text Summarization

Cannot find the paper you are looking for? You can Submit a new open access paper.