1 code implementation • COLING 2022 • Yeon Seonwoo, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Alice Oh
We conduct three experiments 1) domain-specific document retrieval, 2) comparison of our virtual knowledge graph construction method with previous approaches, and 3) ablation study on each component of our virtual knowledge graph.
no code implementations • Findings (NAACL) 2022 • Adyasha Maharana, Quan Tran, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Mohit Bansal
We construct and present a new multimodal dataset consisting of software instructional livestreams and containing manual annotations for both detailed and abstract procedural intent that enable training and evaluation of joint video and text understanding models.
no code implementations • COLING 2022 • Amir Pouran Ben Veyseh, Quan Hung Tran, Seunghyun Yoon, Varun Manjunatha, Hanieh Deilamsalehy, Rajiv Jain, Trung Bui, Walter W. Chang, Franck Dernoncourt, Thien Huu Nguyen
To this end, this work studies new challenges of KP in transcripts of videos, an understudied domain for KP that involves informal texts and non-cohesive presentation styles.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • EMNLP (Eval4NLP) 2020 • Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung
In this paper, we propose an evaluation metric for image captioning systems using both image and text information.
no code implementations • NAACL (BioNLP) 2021 • Khalil Mrini, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Emilia Farcas, Ndapa Nakashole
We show that both transfer learning methods combined achieve the highest ROUGE scores.
no code implementations • COLING 2022 • Cesa Salaam, Franck Dernoncourt, Trung Bui, Danda Rawat, Seunghyun Yoon
The prevalent use of offensive content in social media has become an important reason for concern for online platforms (customer service chat-boxes, social media platforms, etc).
no code implementations • 30 Nov 2023 • Linzi Xing, Quan Tran, Fabian Caba, Franck Dernoncourt, Seunghyun Yoon, Zhaowen Wang, Trung Bui, Giuseppe Carenini
Video topic segmentation unveils the coarse-grained semantic structure underlying videos and is essential for other video understanding tasks.
no code implementations • 8 Nov 2023 • Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.
no code implementations • 7 Nov 2023 • Zhongfen Deng, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Quan Hung Tran, Shuaiqi Liu, Wenting Zhao, Tao Zhang, Yibo Wang, Philip S. Yu
Then we merge the sentences selected for a specific aspect as the input for the summarizer to produce the aspect-based summary.
no code implementations • 15 Sep 2023 • Meryem M'hamdi, Jonathan May, Franck Dernoncourt, Trung Bui, Seunghyun Yoon
Our approach leverages meta-distillation learning based on MAML, an optimization-based Model-Agnostic Meta-Learner.
no code implementations • 24 Jul 2023 • Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen
Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • ICCV 2023 • Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould, Hao Tan
Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot.
no code implementations • 12 Apr 2023 • Viet Dac Lai, Nghia Trung Ngo, Amir Pouran Ben Veyseh, Hieu Man, Franck Dernoncourt, Trung Bui, Thien Huu Nguyen
The answer to this question requires a thorough evaluation of ChatGPT over multiple tasks with diverse languages and large datasets (i. e., beyond reported anecdotes), which is still missing or limited in current research.
1 code implementation • ICCV 2023 • Qiucheng Wu, Yujian Liu, Handong Zhao, Trung Bui, Zhe Lin, Yang Zhang, Shiyu Chang
We then impose spatial attention control by combining the attention over the entire text description and that over the local description of the particular object in the corresponding pixel region of that object.
no code implementations • 15 Mar 2023 • Yongil Kim, Yerin Hwang, Hyeongu Yun, Seunghyun Yoon, Trung Bui, Kyomin Jung
Vulnerability to lexical perturbation is a critical weakness of automatic evaluation metrics for image captioning.
1 code implementation • CVPR 2023 • Bo He, Jun Wang, JieLin Qiu, Trung Bui, Abhinav Shrivastava, Zhaowen Wang
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Ranked #3 on
Supervised Video Summarization
on SumMe
Extractive Text Summarization
Supervised Video Summarization
1 code implementation • ICCV 2023 • Ioana Croitoru, Simion-Vlad Bogolin, Samuel Albanie, Yang Liu, Zhaowen Wang, Seunghyun Yoon, Franck Dernoncourt, Hailin Jin, Trung Bui
To study this problem, we propose the first dataset of untrimmed, long-form tutorial videos for the task of Moment Detection called the Behance Moment Detection (BMD) dataset.
1 code implementation • CVPR 2023 • Qiucheng Wu, Yujian Liu, Handong Zhao, Ajinkya Kale, Trung Bui, Tong Yu, Zhe Lin, Yang Zhang, Shiyu Chang
Based on this finding, we further propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
no code implementations • 12 Oct 2022 • JieLin Qiu, Franck Dernoncourt, Trung Bui, Zhaowen Wang, Ding Zhao, Hailin Jin
Livestream videos have become a significant part of online learning, where design, digital marketing, creative painting, and other skills are taught by experienced experts in the sessions, making them valuable materials.
no code implementations • 10 Oct 2022 • JieLin Qiu, Jiacheng Zhu, Mengdi Xu, Franck Dernoncourt, Trung Bui, Zhaowen Wang, Bo Li, Ding Zhao, Hailin Jin
Multimedia summarization with multimodal output (MSMO) is a recently explored application in language grounding.
1 code implementation • COLING 2022 • Khalil Mrini, Harpreet Singh, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Emilia Farcas, Ndapa Nakashole
The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.
1 code implementation • British Machine Vision Conference (BMVC) 2022 • Nguyen H. Tran, Ta Duc Huy, Soan T. M. Duong, Phan Nguyen, Dao Huu Hung, Chanh D. Tr. Nguyen, Trung Bui, Steven Q.H. Truong
ViT is adapted on each patch to employ the attention mechanism across the 3 × 3 cells to count the number of people in the central cell.
Ranked #2 on
Crowd Counting
on ShanghaiTech A
1 code implementation • 19 Jul 2022 • Thang M. Pham, Seunghyun Yoon, Trung Bui, Anh Nguyen
While contextualized word embeddings have been a de-facto standard, learning contextualized phrase embeddings is less explored and being hindered by the lack of a human-annotated benchmark that tests machine understanding of phrase semantics given a context sentence or paragraph (instead of phrases alone).
1 code implementation • Findings (NAACL) 2022 • Jaemin Cho, Seunghyun Yoon, Ajinkya Kale, Franck Dernoncourt, Trung Bui, Mohit Bansal
Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.
Ranked #26 on
Image Captioning
on COCO Captions
no code implementations • 18 Apr 2022 • Hwanhee Lee, Cheoneum Park, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Juae Kim, Kyomin Jung
In this paper, we propose an efficient factual error correction system RFEC based on entities retrieval post-editing process.
no code implementations • 7 Apr 2022 • JieLin Qiu, Jiacheng Zhu, Mengdi Xu, Franck Dernoncourt, Trung Bui, Zhaowen Wang, Bo Li, Ding Zhao, Hailin Jin
Multimedia summarization with multimodal output can play an essential role in real-world applications, i. e., automatically generating cover images and titles for news articles or providing introductions to online videos.
1 code implementation • 24 Feb 2022 • Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal
To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests.
1 code implementation • 22 Oct 2021 • Thang M. Pham, Trung Bui, Long Mai, Anh Nguyen
We find two reasons why IM is not better than LOO: (1) deleting a single word from the input only marginally reduces a classifier's accuracy; and (2) a highly predictable word is always given near-zero attribution, regardless of its true importance to the classifier.
2 code implementations • EMNLP 2021 • JianGuo Zhang, Trung Bui, Seunghyun Yoon, Xiang Chen, Zhiwei Liu, Congying Xia, Quan Hung Tran, Walter Chang, Philip Yu
In this work, we focus on a more challenging few-shot intent detection scenario where many intents are fine-grained and semantically similar.
1 code implementation • EMNLP 2021 • Sangwoo Cho, Franck Dernoncourt, Tim Ganter, Trung Bui, Nedim Lipka, Walter Chang, Hailin Jin, Jonathan Brandt, Hassan Foroosh, Fei Liu
With the explosive growth of livestream broadcasting, there is an urgent need for new summarization technology that enables us to create a preview of streamed content and tap into this wealth of knowledge.
1 code implementation • ACL 2021 • Khalil Mrini, Franck Dernoncourt, Seunghyun Yoon, Trung Bui, Walter Chang, Emilia Farcas, Ndapa Nakashole
Users of medical question answering systems often submit long and detailed questions, making it hard to achieve high recall in answer retrieval.
no code implementations • 4 Jul 2021 • Tuan Manh Lai, Trung Bui, Doo Soon Kim
Since the first end-to-end neural coreference resolution model was introduced, many extensions to the model have been proposed, ranging from using higher-order inference to directly optimizing evaluation metrics using reinforcement learning.
1 code implementation • ACL 2021 • Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Kyomin Jung
Also, we observe critical problems of the previous benchmark dataset (i. e., human annotations) on image captioning metric, and introduce a new collection of human annotations on the generated captions.
1 code implementation • CVPR 2021 • Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, Chenliang Xu
Recently, language-guided global image editing draws increasing attention with growing application potentials.
1 code implementation • NAACL 2021 • Meryem M'hamdi, Doo Soon Kim, Franck Dernoncourt, Trung Bui, Xiang Ren, Jonathan May
We extensively evaluate our framework on two challenging cross-lingual NLU tasks: multilingual task-oriented dialog and typologically diverse question answering.
1 code implementation • NAACL 2021 • Tuan Lai, Heng Ji, Trung Bui, Quan Hung Tran, Franck Dernoncourt, Walter Chang
Event coreference resolution is an important research problem with many applications.
no code implementations • Findings (ACL) 2021 • Thang M. Pham, Trung Bui, Long Mai, Anh Nguyen
Encouraging classifiers to capture word order information improves the performance on most GLUE tasks, SQuAD 2. 0 and out-of-samples.
Natural Language Inference
Natural Language Understanding
+1
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Nham Le, Tuan Lai, Trung Bui, Doo Soon Kim
With the renaissance of deep learning, neural networks have achieved promising results on many natural language understanding (NLU) tasks.
no code implementations • COLING 2020 • Tuan Manh Lai, Trung Bui, Doo Soon Kim, Quan Hung Tran
Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Xuanli He, Quan Hung Tran, Gholamreza Haffari, Walter Chang, Trung Bui, Zhe Lin, Franck Dernoncourt, Nhan Dam
In this paper, we explore the novel problem of graph modification, where the systems need to learn how to update an existing scene graph given a new user's command.
no code implementations • 5 Oct 2020 • Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang Xu
To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations.
1 code implementation • CVPR 2020 • Chenyun Wu, Zhe Lin, Scott Cohen, Trung Bui, Subhransu Maji
We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77, 262 images and 345, 486 phrase-region pairs.
Ranked #3 on
Referring Expression Segmentation
on PhraseCut
no code implementations • 2 Aug 2020 • Lidan Wang, Franck Dernoncourt, Trung Bui
The performance of many machine learning models depends on their hyper-parameter settings.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Tuan Manh Lai, Trung Bui, Nedim Lipka
Despite the growth of e-commerce, brick-and-mortar stores are still the preferred destinations for many people.
no code implementations • NAACL 2021 • Jinfeng Xiao, Lidan Wang, Franck Dernoncourt, Trung Bui, Tong Sun, Jiawei Han
Our reader-retriever first uses an offline reader to read the corpus and generate collections of all answerable questions associated with their answers, and then uses an online retriever to respond to user queries by searching the pre-constructed question spaces for answers that are most likely to be asked in the given way.
no code implementations • WS 2020 • Anthony Colas, Trung Bui, Franck Dernoncourt, Moumita Sinha, Doo Soon Kim
Many users communicate with chatbots and AI assistants in order to help them with various tasks.
2 code implementations • ACL 2020 • Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, Verena Rieser
Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.
1 code implementation • NAACL 2021 • Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Joongbo Shin, Kyomin Jung
To evaluate our metric, we create high-quality human judgments of correctness on two GenQA datasets.
no code implementations • 1 Apr 2020 • Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung
Audio Visual Scene-aware Dialog (AVSD) is the task of generating a response for a question with a given scene, video, audio, and the history of previous turns in the dialog.
no code implementations • 16 Feb 2020 • Tzu-Hsiang Lin, Trung Bui, Doo Soon Kim, Jean Oh
In this paper, we present a multimodal dialogue system for Conversational Image Editing.
1 code implementation • LREC 2020 • Tzu-Hsiang Lin, Alexander Rudnicky, Trung Bui, Doo Soon Kim, Jean Oh
Our system grounds language on the level of edit operations, and suggests options for a user to choose from.
1 code implementation • EMNLP 2020 • Kang Min Yoo, Hanbit Lee, Franck Dernoncourt, Trung Bui, Walter Chang, Sang-goo Lee
Recent works have shown that generative data augmentation, where synthetic samples generated from deep generative models complement the training dataset, benefit NLP tasks.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Khalil Mrini, Franck Dernoncourt, Quan Tran, Trung Bui, Walter Chang, Ndapa Nakashole
Finally, we find that the Label Attention heads learn relations between syntactic categories and show pathways to analyze errors.
Ranked #1 on
Dependency Parsing
on Penn Treebank
no code implementations • 28 Oct 2019 • Tuan Manh Lai, Quan Hung Tran, Trung Bui, Daisuke Kihara
In a task-oriented dialog system, the goal of dialog state tracking (DST) is to monitor the state of the conversation from the dialog history.
Ranked #4 on
Dialogue State Tracking
on Wizard-of-Oz
1 code implementation • IJCNLP 2019 • Tuan Lai, Quan Hung Tran, Trung Bui, Daisuke Kihara
Answer selection is an important research problem, with applications in many areas.
1 code implementation • LREC 2020 • Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung
In this study, we propose a novel graph neural network called propagate-selector (PS), which propagates information over sentences to understand information that cannot be inferred when considering sentences in isolation.
no code implementations • 8 Aug 2019 • Subhadeep Dey, Petr Motlicek, Trung Bui, Franck Dernoncourt
In this paper, we explore various approaches for semi supervised learning in an end to end automatic speech recognition (ASR) framework.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • ACL 2019 • Hao Tan, Franck Dernoncourt, Zhe Lin, Trung Bui, Mohit Bansal
To push forward the research in this direction, we first introduce a new language-guided image editing dataset that contains a large number of real image pairs with corresponding editing instructions.
no code implementations • 30 May 2019 • Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung
In this paper, we propose a novel method for a sentence-level answer-selection task that is a fundamental problem in natural language processing.
Ranked #8 on
Question Answering
on TrecQA
no code implementations • 30 Mar 2019 • Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, Tamara L. Berg
This work presents computational methods for transferring body movements from one person to another with videos collected in the wild.
no code implementations • 8 Jan 2019 • Tuan Manh Lai, Trung Bui, Nedim Lipka, Sheng Li
Popular e-commerce websites such as Amazon offer community question answering systems for users to pose product related questions and experienced customers may provide answers voluntarily.
no code implementations • 3 Dec 2018 • Jacqueline Brixey, Ramesh Manuvinakurike, Nham Le, Tuan Lai, Walter Chang, Trung Bui
This work presents the task of modifying images in an image editing program using natural language written commands.
no code implementations • COLING 2018 • Tuan Manh Lai, Trung Bui, Sheng Li
Given a question and a set of candidate answers, answer selection is the task of identifying which of the candidates answers the question correctly.
no code implementations • WS 2018 • Ramesh Manuvinakurike, Trung Bui, Walter Chang, Kallirroi Georgila
We present {``}conversational image editing{''}, a novel real-world application domain combining dialogue, visual information, and the use of computer vision.
no code implementations • WS 2018 • Tuan Lai, Trung Bui, Sheng Li, Nedim Lipka
When evaluating a potential product purchase, customers may have many questions in mind.
no code implementations • NAACL 2018 • Quan Hung Tran, Tuan Lai, Gholamreza Haffari, Ingrid Zukerman, Trung Bui, Hung Bui
Contextual sequence mapping is one of the fundamental problems in Natural Language Processing (NLP).
2 code implementations • NAACL 2018 • Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, Nazli Goharian
Neural abstractive summarization models have led to promising results in summarizing relatively short documents.
Ranked #4 on
Unsupervised Extractive Summarization
on Pubmed
Abstractive Text Summarization
Unsupervised Extractive Summarization
3 code implementations • CVPR 2018 • Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, Tamara L. Berg
As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world.
2 code implementations • CVPR 2017 • Kan Chen, Trung Bui, Fang Chen, Zhaowen Wang, Ram Nevatia
According to the intent of query, attention mechanism can be introduced to adaptively balance the importance of different modalities.
no code implementations • 20 Oct 2016 • Omid Bakhshandeh, Trung Bui, Zhe Lin, Walter Chang
One of the most interesting recent open-ended question answering challenges is Visual Question Answering (VQA) which attempts to evaluate a system's visual understanding through its answers to natural language questions about images.