Search Results for author: Pascale Fung

Found 176 papers, 80 papers with code

Korean Language Modeling via Syntactic Guide

no code implementations • LREC 2022 • Hyeondey Kim, Seonhoon Kim, Inho Kang, Nojun Kwak, Pascale Fung

Our experiment results prove that the proposed methods improve the model performance of the investigated Korean language understanding tasks.

Language Modelling POS

Paper
Add Code

CAiRE in DialDoc21: Data Augmentation for Information Seeking Dialogue System

1 code implementation • ACL (dialdoc) 2021 • Yan Xu, Etsuko Ishii, Genta Indra Winata, Zhaojiang Lin, Andrea Madotto, Zihan Liu, Peng Xu, Pascale Fung

Information-seeking dialogue systems, including knowledge identification and response generation, aim to respond to users with fluent, coherent, and informative responses based on users’ needs, which.

Data Augmentation Response Generation

Paper
Code

Preserving Cross-Linguality of Pre-trained Models via Continual Learning

no code implementations • ACL (RepL4NLP) 2021 • Zihan Liu, Genta Indra Winata, Andrea Madotto, Pascale Fung

Recently, fine-tuning pre-trained language models (e. g., multilingual BERT) to downstream cross-lingual tasks has shown promising results.

Continual Learning named-entity-recognition +5

Paper
Add Code

Clozer”:" Adaptable Data Augmentation for Cloze-style Reading Comprehension

no code implementations • RepL4NLP (ACL) 2022 • Holy Lovenia, Bryan Wilie, Willy Chung, Zeng Min, Samuel Cahyawijaya, Dan Su, Pascale Fung

Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task.

Data Augmentation Machine Reading Comprehension +1

Paper
Add Code

Dimsum @LaySumm 20

1 code implementation • EMNLP (sdp) 2020 • Tiezheng Yu, Dan Su, Wenliang Dai, Pascale Fung

Lay summarization aims to generate lay summaries of scientific papers automatically.

Lay Summarization Sentence

Paper
Code

CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition

1 code implementation • LREC 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Yiu, Rita Frieske, Holy Lovenia, Genta Winata, Qifeng Chen, Xiaojuan Ma, Bertram Shi, Pascale Fung

With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Integrating Question Rewrites in Conversational Question Answering: A Reinforcement Learning Approach

no code implementations • ACL 2022 • Etsuko Ishii, Bryan Wilie, Yan Xu, Samuel Cahyawijaya, Pascale Fung

Resolving dependencies among dialogue history is one of the main obstacles in the research on conversational question answering (QA).

Conversational Question Answering reinforcement-learning +1

Paper
Add Code

High-Dimension Human Value Representation in Large Language Models

no code implementations • 11 Apr 2024 • Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, Pascale Fung

there is an urgent need to understand the scope and nature of human values injected into these models before their release.

Language Modelling

Paper
Add Code

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages

no code implementations • 9 Apr 2024 • Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Rifki Afina Putri, Emmanuel Dave, Jhonson Lee, Nuur Shadieq, Wawan Cenggoro, Salsabil Maulana Akbar, Muhammad Ihza Mahendra, Dea Annisayanti Putri, Bryan Wilie, Genta Indra Winata, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung

To bridge this quality gap, we introduce Cendol, a collection of Indonesian LLMs encompassing both decoder-only and encoder-decoder architectures across a range of model sizes.

Paper
Add Code

Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

no code implementations • 27 Mar 2024 • Yejin Bang, Delong Chen, Nayeon Lee, Pascale Fung

We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues.

Paper
Add Code

LLMs Are Few-Shot In-Context Low-Resource Language Learners

no code implementations • 25 Mar 2024 • Samuel Cahyawijaya, Holy Lovenia, Pascale Fung

In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages using only short in-context information, offering a crucial avenue for narrowing the gap between high-resource and low-resource languages.

In-Context Learning

Paper
Add Code

Subobject-level Image Tokenization

1 code implementation • 22 Feb 2024 • Delong Chen, Samuel Cahyawijaya, Jianfeng Liu, Baoyuan Wang, Pascale Fung

Transformer-based vision models typically tokenize images into fixed-size square patches as input units, which lacks the adaptability to image content and overlooks the inherent pixel grouping structure.

Attribute Language Modelling +1

Paper
Code

RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training

1 code implementation • 7 Dec 2023 • Jaehyung Kim, Yuning Mao, Rui Hou, Hanchao Yu, Davis Liang, Pascale Fung, Qifan Wang, Fuli Feng, Lifu Huang, Madian Khabsa

Under a unified evaluation of fine-tuned LMs by incorporating four representative perspectives of model robustness, we demonstrate the effectiveness of RoAST compared to state-of-the-art fine-tuning methods on six different types of LMs, which indicates its usefulness in practice.

Adversarial Robustness

Paper
Code

IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages

no code implementations • 21 Nov 2023 • Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Genta Indra Winata, Pascale Fung, Ayu Purwarianti

Significant progress has been made on Indonesian NLP.

Paper
Add Code

Mitigating Framing Bias with Polarity Minimization Loss

no code implementations • 3 Nov 2023 • Yejin Bang, Nayeon Lee, Pascale Fung

Framing bias plays a significant role in exacerbating political polarization by distorting the perception of actual events.

Document Summarization Multi-Document Summarization

Paper
Add Code

Contrastive Learning for Inference in Dialogue

1 code implementation • 19 Oct 2023 • Etsuko Ishii, Yan Xu, Bryan Wilie, Ziwei Ji, Holy Lovenia, Willy Chung, Pascale Fung

Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker.

Contrastive Learning

Paper
Code

InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

1 code implementation • 13 Oct 2023 • Willy Chung, Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Pascale Fung

We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning.

Dialogue State Tracking Informativeness +4

Paper
Code

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

no code implementations • 10 Oct 2023 • Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, Pascale Fung

Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.

Answer Generation Hallucination +1

Paper
Add Code

Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

no code implementations • 9 Oct 2023 • Holy Lovenia, Wenliang Dai, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung

Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects.

Hallucination Object +2

Paper
Add Code

Survey of Social Bias in Vision-Language Models

no code implementations • 24 Sep 2023 • Nayeon Lee, Yejin Bang, Holy Lovenia, Samuel Cahyawijaya, Wenliang Dai, Pascale Fung

This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL.

Fairness

Paper
Add Code

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

1 code implementation • 19 Sep 2023 • Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung

We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.

Document Translation Translation

Paper
Code

PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems

1 code implementation • 19 Sep 2023 • Bryan Wilie, Yan Xu, Willy Chung, Samuel Cahyawijaya, Holy Lovenia, Pascale Fung

Grounding dialogue response generation on external knowledge is proposed to produce informative and engaging responses.

Hallucination Language Modelling +1

Paper
Code

Improving Query-Focused Meeting Summarization with Query-Relevant Knowledge

no code implementations • 5 Sep 2023 • Tiezheng Yu, Ziwei Ji, Pascale Fung

Query-Focused Meeting Summarization (QFMS) aims to generate a summary of a given meeting transcript conditioned upon a query.

Meeting Summarization

Paper
Add Code

Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

1 code implementation • 26 Jun 2023 • Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan Liu, Pascale Fung

In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly.

Data Augmentation Speech Emotion Recognition

Paper
Code

Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering

no code implementations • 6 Jun 2023 • Irina-Elena Veliche, Pascale Fung

In the past few years there have been many improvements in overall speech recognition quality, but without any particular focus on advancing Equality and Equity for all user groups for whom systems do not perform well.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

1 code implementation • 1 Jun 2023 • Yan Xu, Deqian Kong, Dehong Xu, Ziwei Ji, Bo Pang, Pascale Fung, Ying Nian Wu

The capability to generate responses with diversity and faithfulness using factual knowledge is paramount for creating a human-like, trustworthy dialogue system.

Dialogue Generation Response Generation

Paper
Code

InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning

1 code implementation • 23 May 2023 • Samuel Cahyawijaya, Holy Lovenia, Tiezheng Yu, Willy Chung, Pascale Fung

Our results demonstrate the effectiveness of InstructAlign in enabling the model to understand low-resource languages with limited parallel data while preventing catastrophic forgetting.

Paper
Code

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

2 code implementations • NeurIPS 2023 • Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi

Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.

Ranked #5 on visual instruction following on LLaVA-Bench

Video Question Answering visual instruction following +1

8,701

Paper
Code

Learn What NOT to Learn: Towards Generative Safety in Chatbots

no code implementations • 21 Apr 2023 • Leila Khalatbari, Yejin Bang, Dan Su, Willy Chung, Saeed Ghadimi, Hossein Sameti, Pascale Fung

Our approach differs from the standard contrastive learning framework in that it automatically obtains positive and negative signals from the safe and unsafe language distributions that have been learned beforehand.

Contrastive Learning

Paper
Add Code

Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue

1 code implementation • 28 Feb 2023 • Holy Lovenia, Samuel Cahyawijaya, Pascale Fung

The demand for multimodal dialogue systems has been rising in various domains, emphasizing the importance of interpreting multimodal inputs from conversational and situational contexts.

Paper
Code

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

1 code implementation • 8 Feb 2023 • Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung

It is, for example, better at deductive than inductive reasoning.

Code Generation Hallucination +4

Paper
Code

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

1 code implementation • 19 Dec 2022 • Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, JENNIFER SANTOSO, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

253

Paper
Code

RHO ($ρ$): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding

1 code implementation • 3 Dec 2022 • Ziwei Ji, Zihan Liu, Nayeon Lee, Tiezheng Yu, Bryan Wilie, Min Zeng, Pascale Fung

Dialogue systems can leverage large pre-trained language models and knowledge to generate fluent and informative responses.

Hallucination Representation Learning +1

Paper
Code

Casual Conversations v2: Designing a large consent-driven dataset to measure algorithmic bias and robustness

no code implementations • 10 Nov 2022 • Caner Hazirbas, Yejin Bang, Tiezheng Yu, Parisa Assar, Bilal Porgali, Vítor Albiero, Stefan Hermanek, Jacqueline Pan, Emily McReynolds, Miranda Bogen, Pascale Fung, Cristian Canton Ferrer

Developing robust and fair AI systems require datasets with comprehensive set of labels that can help ensure the validity and legitimacy of relevant measurements.

Fairness

Paper
Add Code

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

6 code implementations • 9 Nov 2022 • BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.

Language Modelling Multilingual NLP

2,186

Paper
Code

How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

1 code implementation • 25 Oct 2022 • Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Huan Zhong, MingQian Zhong, Yuk-Yu Nancy Ip, Pascale Fung

Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA.

Language Modelling

Paper
Code

Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

1 code implementation • 14 Oct 2022 • Wenliang Dai, Zihan Liu, Ziwei Ji, Dan Su, Pascale Fung

Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information.

Hallucination Image Augmentation +3

Paper
Code

Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

no code implementations • 14 Oct 2022 • Yejin Bang, Tiezheng Yu, Andrea Madotto, Zhaojiang Lin, Mona Diab, Pascale Fung

Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.

Classification Few-Shot Learning +1

Paper
Add Code

Context Generation Improves Open Domain Question Answering

no code implementations • 12 Oct 2022 • Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro

Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge.

Language Modelling Open-Domain Question Answering

Paper
Add Code

Every picture tells a story: Image-grounded controllable stylistic story generation

no code implementations • LaTeCHCLfL (COLING) 2022 • Holy Lovenia, Bryan Wilie, Romain Barraud, Samuel Cahyawijaya, Willy Chung, Pascale Fung

Generating a short story out of an image is arduous.

Image Captioning Story Generation

Paper
Add Code

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

no code implementations • 6 Jul 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale Fung

With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Add Code

Factuality Enhanced Language Models for Open-Ended Text Generation

3 code implementations • 9 Jun 2022 • Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale Fung, Mohammad Shoeybi, Bryan Catanzaro

In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation.

Misconceptions Sentence +2

5,446

Paper
Code

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

2,649

Paper
Code

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

2 code implementations • 31 May 2022 • Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder

In this work, we focus on developing resources for languages in Indonesia.

Machine Translation Translation

Paper
Code

ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

no code implementations • 25 May 2022 • Badr AlKhamissi, Faisal Ladhak, Srini Iyer, Ves Stoyanov, Zornitsa Kozareva, Xian Li, Pascale Fung, Lambert Mathias, Asli Celikyilmaz, Mona Diab

Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next.

Cultural Vocal Bursts Intensity Prediction Few-Shot Learning +1

Paper
Add Code

Towards Answering Open-ended Ethical Quandary Questions

no code implementations • 12 May 2022 • Yejin Bang, Nayeon Lee, Tiezheng Yu, Leila Khalatbari, Yan Xu, Samuel Cahyawijaya, Dan Su, Bryan Wilie, Romain Barraud, Elham J. Barezi, Andrea Madotto, Hayden Kee, Pascale Fung

We explore the current capability of LLMs in providing an answer with a deliberative exchange of different perspectives to an ethical quandary, in the approach of Socratic philosophy, instead of providing a closed answer like an oracle.

Few-Shot Learning Generative Question Answering +2

Paper
Add Code

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

1 code implementation • BioNLP (ACL) 2022 • Samuel Cahyawijaya, Tiezheng Yu, Zihan Liu, Tiffany T. W. Mak, Xiaopu Zhou, Nancy Y. Ip, Pascale Fung

We apply SNP2Vec to perform long-sequence genomics modeling, and we evaluate the effectiveness of our approach on predicting Alzheimer's disease risk in a Chinese cohort.

Genome Understanding

Paper
Code

NeuS: Neutral Multi-News Summarization for Mitigating Framing Bias

1 code implementation • NAACL 2022 • Nayeon Lee, Yejin Bang, Tiezheng Yu, Andrea Madotto, Pascale Fung

Based on our discovery that title provides a good signal for framing bias, we present NeuS-TITLE that learns to neutralize news content in hierarchical order from title to article.

Multi-Task Learning News Summarization

Paper
Code

Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

no code implementations • 30 Mar 2022 • Holy Lovenia, Bryan Wilie, Willy Chung, Min Zeng, Samuel Cahyawijaya, Su Dan, Pascale Fung

Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task.

Data Augmentation Machine Reading Comprehension +1

Paper
Add Code

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

no code implementations • Findings (ACL) 2022 • Wenliang Dai, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung

Furthermore, the original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.

Image Captioning Knowledge Distillation +4

Paper
Add Code

VScript: Controllable Script Generation with Visual Presentation

no code implementations • 1 Mar 2022 • Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske, Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung

In order to offer a customized script tool and inspire professional scriptwriters, we present VScript.

Dialogue Generation Retrieval +1

Paper
Add Code

Read before Generate! Faithful Long Form Question Answering with Machine Reading

no code implementations • Findings (ACL) 2022 • Dan Su, Xiaoguang Li, Jindi Zhang, Lifeng Shang, Xin Jiang, Qun Liu, Pascale Fung

Long-form question answering (LFQA) aims to generate a paragraph-length answer for a given question.

Ranked #1 on Question Answering on KILT: ELI5

Answer Generation Long Form Question Answering +1

Paper
Add Code

QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation

1 code implementation • 14 Feb 2022 • Dan Su, Peng Xu, Pascale Fung

Multi-hop question generation (MQG) aims to generate complex questions which require reasoning over multiple pieces of information of the input passage.

Multi-hop Question Answering Question Answering +2

Paper
Code

Survey of Hallucination in Natural Language Generation

no code implementations • 8 Feb 2022 • Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Delong Chen, Ho Shu Chan, Wenliang Dai, Andrea Madotto, Pascale Fung

This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation.

Abstractive Text Summarization Data-to-Text Generation +4

Paper
Add Code

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

1 code implementation • 11 Jan 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

1 code implementation • LREC 2022 • Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

2 code implementations • LREC 2022 • Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong.

120

Paper
Code

NER-BERT: A Pre-trained Model for Low-Resource Entity Tagging

no code implementations • 1 Dec 2021 • Zihan Liu, Feijun Jiang, Yuxiang Hu, Chen Shi, Pascale Fung

Named entity recognition (NER) models generally perform poorly when large training datasets are unavailable for low-resource domains.

Language Modelling named-entity-recognition +2

Paper
Add Code

Confucius, Cyberpunk and Mr. Science: Comparing AI ethics between China and the EU

no code implementations • 15 Nov 2021 • Pascale Fung, Hubert Etienne

Even when people from different cultures happen to agree on a set of common principles, it does not necessarily mean that they share the same understanding of these concepts and what they entail.

Ethics

Paper
Add Code

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

2 code implementations • 15 Oct 2021 • Andrea Madotto, Zhaojiang Lin, Genta Indra Winata, Pascale Fung

A simple yet unexplored solution is prompt-based few-shot learning (Brown et al. 2020) which does not require gradient-based fine-tuning but instead uses a few examples in the LM context as the only source of learning.

Chatbot Dialogue State Tracking +3

749

Paper
Code

Language Models are Few-shot Multilingual Learners

1 code implementation • EMNLP (MRL) 2021 • Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Rosanne Liu, Jason Yosinski, Pascale Fung

General-purpose language models have demonstrated impressive capabilities, performing on par with state-of-the-art approaches on a range of downstream natural language processing (NLP) tasks and benchmarks when inferring instructions from very few examples.

Multi-class Classification

Paper
Code

Greenformer: Factorization Toolkit for Efficient Deep Neural Networks

no code implementations • 14 Sep 2021 • Samuel Cahyawijaya, Genta Indra Winata, Holy Lovenia, Bryan Wilie, Wenliang Dai, Etsuko Ishii, Pascale Fung

While the recent advances in deep neural networks (DNN) bring remarkable success, the computational cost also increases considerably.

Paper
Add Code

Zero-Shot Dialogue State Tracking via Cross-Task Transfer

1 code implementation • EMNLP 2021 • Zhaojiang Lin, Bing Liu, Andrea Madotto, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Eunjoon Cho, Rajen Subba, Pascale Fung

Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data.

Dialogue State Tracking Question Answering +1

Paper
Code

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization

1 code implementation • EMNLP 2021 • Tiezheng Yu, Wenliang Dai, Zihan Liu, Pascale Fung

Multimodal abstractive summarization (MAS) models that summarize videos (vision modality) and their corresponding transcripts (text modality) are able to extract the essential information from massive multimodal data on the Internet.

Abstractive Text Summarization Text Generation

Paper
Code

Assessing Political Prudence of Open-domain Chatbots

1 code implementation • SIGDIAL (ACL) 2021 • Yejin Bang, Nayeon Lee, Etsuko Ishii, Andrea Madotto, Pascale Fung

In this work, as a first step towards a politically safe chatbot, we propose a group of metrics for assessing their political prudence.

Chatbot

Paper
Code

CAiRE in DialDoc21: Data Augmentation for Information-Seeking Dialogue System

1 code implementation • 7 Jun 2021 • Etsuko Ishii, Yan Xu, Genta Indra Winata, Zhaojiang Lin, Andrea Madotto, Zihan Liu, Peng Xu, Pascale Fung

Data Augmentation Response Generation

Paper
Code

X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented Compositional Semantic Parsing

1 code implementation • ACL (RepL4NLP) 2021 • Zihan Liu, Genta Indra Winata, Peng Xu, Pascale Fung

Experimental results illustrate that our model can significantly outperform existing strong baselines in cross-lingual and cross-domain settings, and our model can also achieve a good generalization ability on target languages of target domains.

Semantic Parsing

Paper
Code

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

1 code implementation • 5 Jun 2021 • Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Peng Xu, Feijun Jiang, Yuxiang Hu, Chen Shi, Pascale Fung

However, existing datasets for end-to-end ToD modeling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions.

Cross-Lingual Transfer Transfer Learning

Paper
Code

ERICA: An Empathetic Android Companion for Covid-19 Quarantine

no code implementations • SIGDIAL (ACL) 2021 • Etsuko Ishii, Genta Indra Winata, Samuel Cahyawijaya, Divesh Lala, Tatsuya Kawahara, Pascale Fung

Over the past year, research in various domains, including Natural Language Processing (NLP), has been accelerated to fight against the COVID-19 pandemic, yet such research has just started on dialogue systems.

Paper
Add Code

Nora: The Well-Being Coach

no code implementations • 1 Jun 2021 • Genta Indra Winata, Holy Lovenia, Etsuko Ishii, Farhad Bin Siddique, Yongsheng Yang, Pascale Fung

The current pandemic has forced people globally to remain in isolation and practice social distancing, which creates the need for a system to combat the resulting loneliness and negative emotions.

Natural Language Understanding

Paper
Add Code

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

1 code implementation • ACL 2021 • Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab

The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages.

Denoising Machine Translation +2

Paper
Code

Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance

1 code implementation • Findings (ACL) 2021 • Dan Su, Tiezheng Yu, Pascale Fung

Query focused summarization (QFS) models aim to generate summaries from source documents that can answer the given query.

Abstractive Text Summarization Query-focused Summarization +1

Paper
Code

QAConv: Question Answering on Informative Conversations

1 code implementation • ACL 2022 • Chien-Sheng Wu, Andrea Madotto, Wenhao Liu, Pascale Fung, Caiming Xiong

This paper introduces QAConv, a new question answering (QA) dataset that uses conversations as a knowledge source.

Question Answering

Paper
Code

Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters

1 code implementation • dialdoc (ACL) 2022 • Yan Xu, Etsuko Ishii, Samuel Cahyawijaya, Zihan Liu, Genta Indra Winata, Andrea Madotto, Dan Su, Pascale Fung

This paper proposes KnowExpert, a framework to bypass the explicit retrieval process and inject knowledge into the pre-trained language models with lightweight adapters and adapt to the knowledge-grounded dialogue task.

Response Generation Retrieval

Paper
Code

Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation

no code implementations • Findings (ACL) 2021 • Zihan Liu, Genta Indra Winata, Pascale Fung

The data scarcity in low-resource languages has become a bottleneck to building robust neural machine translation systems.

Low-Resource Neural Machine Translation NMT +1

Paper
Add Code

Weakly-supervised Multi-task Learning for Multimodal Affect Recognition

no code implementations • 23 Apr 2021 • Wenliang Dai, Samuel Cahyawijaya, Yejin Bang, Pascale Fung

In this paper, we propose to leverage these datasets using weakly-supervised multi-task learning to improve the generalization performance on each of them.

Emotion Recognition Multi-Task Learning +1

Paper
Add Code

Dynamically Addressing Unseen Rumor via Continual Learning

no code implementations • 18 Apr 2021 • Nayeon Lee, Andrea Madotto, Yejin Bang, Pascale Fung

Rumors are often associated with newly emerging events, thus, an ability to deal with unseen rumors is crucial for a rumor veracity classification model.

Continual Learning Veracity Classification

Paper
Add Code

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

2 code implementations • EMNLP 2021 • Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Leylia Khodra, Ayu Purwarianti, Pascale Fung

Natural language generation (NLG) benchmarks provide an important avenue to measure progress and develop better NLG systems.

Machine Translation Question Answering +2

Paper
Code

On Unifying Misinformation Detection

1 code implementation • NAACL 2021 • Nayeon Lee, Belinda Z. Li, Sinong Wang, Pascale Fung, Hao Ma, Wen-tau Yih, Madian Khabsa

In this paper, we introduce UnifiedM2, a general-purpose misinformation model that jointly models multiple domains of misinformation with a single, unified setup.

Few-Shot Learning Misinformation

Paper
Code

Mitigating Media Bias through Neutral Article Generation

no code implementations • 1 Apr 2021 • Nayeon Lee, Yejin Bang, Andrea Madotto, Pascale Fung

Media bias can lead to increased political polarization, and thus, the need for automatic mitigation methods is growing.

Paper
Add Code

Are Multilingual Models Effective in Code-Switching?

no code implementations • NAACL (CALCS) 2021 • Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Pascale Fung

Multilingual language models have shown decent performance in multilingual and cross-lingual natural language understanding tasks.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

AdaptSum: Towards Low-Resource Domain Adaptation for Abstractive Summarization

1 code implementation • NAACL 2021 • Tiezheng Yu, Zihan Liu, Pascale Fung

State-of-the-art abstractive summarization models generally rely on extensive labeled data, which lowers their generalization ability on domains where such data are not available.

Abstractive Text Summarization Domain Adaptation

Paper
Code

Towards Few-Shot Fact-Checking via Perplexity

no code implementations • NAACL 2021 • Nayeon Lee, Yejin Bang, Andrea Madotto, Madian Khabsa, Pascale Fung

Through experiments, we empirically verify the plausibility of the rather surprising usage of the perplexity score in the context of fact-checking and highlight the strength of our few-shot methodology by comparing it to strong fine-tuning-based baseline models.

Fact Checking Few-Shot Learning +5

Paper
Add Code

Multimodal End-to-End Sparse Model for Emotion Recognition

1 code implementation • NAACL 2021 • Wenliang Dai, Samuel Cahyawijaya, Zihan Liu, Pascale Fung

Existing works on multimodal affective computing tasks, such as emotion recognition, generally adopt a two-phase pipeline, first extracting feature representations for each single modality with hand-crafted algorithms and then performing end-to-end learning with the extracted features.

Emotion Recognition

Paper
Code

Model Generalization on COVID-19 Fake News Detection

no code implementations • 11 Jan 2021 • Yejin Bang, Etsuko Ishii, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung

Amid the pandemic COVID-19, the world is facing unprecedented infodemic with the proliferation of both fake and real information.

Fake News Detection Misinformation

Paper
Add Code

CrossNER: Evaluating Cross-Domain Named Entity Recognition

5 code implementations • 8 Dec 2020 • Zihan Liu, Yan Xu, Tiezheng Yu, Wenliang Dai, Ziwei Ji, Samuel Cahyawijaya, Andrea Madotto, Pascale Fung

Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains.

Cross-Domain Named Entity Recognition Domain Adaptation +3

115

Paper
Code

A Study on the Autoregressive and non-Autoregressive Multi-label Learning

no code implementations • 3 Dec 2020 • Elham J. Barezi, Iacer Calixto, Kyunghyun Cho, Pascale Fung

These tasks are hard because the label space is usually (i) very large, e. g. thousands or millions of labels, (ii) very sparse, i. e. very few labels apply to each input document, and (iii) highly correlated, meaning that the existence of one label changes the likelihood of predicting all other labels.

Multi-Label Learning

Paper
Add Code

Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-TaskLearning for Offensive Language Detection

1 code implementation • SEMEVAL 2020 • Wenliang Dai, Tiezheng Yu, Zihan Liu, Pascale Fung

Nowadays, offensive content in social media has become a serious problem, and automatically detecting offensive language is an essential task.

Language Modelling Multi-Task Learning

Paper
Code

Dimsum @LaySumm 20: BART-based Approach for Scientific Document Summarization

1 code implementation • 19 Oct 2020 • Tiezheng Yu, Dan Su, Wenliang Dai, Pascale Fung

Lay summarization aims to generate lay summaries of scientific papers automatically.

Document Summarization Lay Summarization +1

Paper
Code

Multi-hop Question Generation with Graph Convolutional Network

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Dan Su, Yan Xu, Wenliang Dai, Ziwei Ji, Tiezheng Yu, Pascale Fung

Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.

Question Generation Question-Generation +1

Paper
Code

Plug-and-Play Conversational Models

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Andrea Madotto, Etsuko Ishii, Zhaojiang Lin, Sumanth Dathathri, Pascale Fung

These large conversational models provide little control over the generated responses, and this control is further limited in the absence of annotated conversational datasets for attribute specific generation that can be used for fine-tuning the model.

Attribute Language Modelling +2

Paper
Code

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

no code implementations • EMNLP 2020 • Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro

We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process.

Sentence Sentence Embedding +2

Paper
Add Code

Cross-lingual Spoken Language Understanding with Regularized Representation Alignment

1 code implementation • EMNLP 2020 • Zihan Liu, Genta Indra Winata, Peng Xu, Zhaojiang Lin, Pascale Fung

Despite the promising results of current cross-lingual models for spoken language understanding systems, they still suffer from imperfect cross-lingual representation alignments between the source and target languages, which makes the performance sub-optimal.

Sentence Spoken Language Understanding

Paper
Code

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Andrea Madotto, Samuel Cahyawijaya, Genta Indra Winata, Yan Xu, Zihan Liu, Zhaojiang Lin, Pascale Fung

In this paper, we propose a method to embed the KB, of any size, directly into the model parameters.

Dialogue State Tracking Management +1

Paper
Code

MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems

1 code implementation • EMNLP 2020 • Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Pascale Fung

In this paper, we propose Minimalist Transfer Learning (MinTL) to simplify the system design process of task-oriented dialogue systems and alleviate the over-dependency on annotated data.

Ranked #15 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.1

Dialogue State Tracking Multi-domain Dialogue State Tracking +3

Paper
Code

Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Wenliang Dai, Zihan Liu, Tiezheng Yu, Pascale Fung

Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions.

Multimodal Emotion Recognition Word Embeddings

Paper
Code

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

3 code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, Ayu Purwarianti

Although Indonesian is known to be the fourth most frequently used language over the internet, the research progress on this language in the natural language processing (NLP) is slow-moving due to a lack of available resources.

Benchmarking Natural Language Understanding +2

496

Paper
Code

The Adapter-Bot: All-In-One Controllable Conversational Model

1 code implementation • 28 Aug 2020 • Andrea Madotto, Zhaojiang Lin, Yejin Bang, Pascale Fung

The dialogue skills can be triggered automatically via a dialogue manager, or manually, thus allowing high-level control of the generated responses.

Movie Recommendation

Paper
Code

EmoGraph: Capturing Emotion Correlations using Graph Networks

no code implementations • 21 Aug 2020 • Peng Xu, Zihan Liu, Genta Indra Winata, Zhaojiang Lin, Pascale Fung

Most emotion recognition methods tackle the emotion understanding task by considering individual emotion independently while ignoring their fuzziness nature and the interconnections among them.

Ranked #3 on Emotion Classification on SemEval 2018 Task 1E-c

Classification Emotion Classification +3

Paper
Add Code

Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems

no code implementations • 14 Aug 2020 • Andrea Madotto, Zihan Liu, Zhaojiang Lin, Pascale Fung

In this paper, we evaluate the priming few-shot ability of language models in the NLU, DST, DP and NLG tasks.

Dialogue State Tracking Few-Shot Learning +4

Paper
Add Code

Misinformation Has High Perplexity

1 code implementation • 8 Jun 2020 • Nayeon Lee, Yejin Bang, Andrea Madotto, Pascale Fung

Debunking misinformation is an important and time-critical task as there could be adverse consequences when misinformation is not quashed promptly.

Language Modelling Misinformation +3

Paper
Code

Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction

no code implementations • CL 2020 • Marta R. Costa-juss{\`a}, Cristina Espa{\~n}a-Bonet, Pascale Fung, Noah A. Smith

We introduce the Computational Linguistics special issue on Multilingual and Interlingual Semantic Representations for Natural Language Processing.

Paper
Add Code

CAiRE-COVID: A Question Answering and Query-focused Multi-Document Summarization System for COVID-19 Scholarly Information Management

1 code implementation • EMNLP (NLP-COVID19) 2020 • Dan Su, Yan Xu, Tiezheng Yu, Farhad Bin Siddique, Elham J. Barezi, Pascale Fung

We present CAiRE-COVID, a real-time question answering (QA) and multi-document summarization system, which won one of the 10 tasks in the Kaggle COVID-19 Open Research Dataset Challenge, judged by medical experts.

Document Summarization Information Retrieval +3

Paper
Code

Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

no code implementations • 29 Apr 2020 • Zihan Liu, Genta Indra Winata, Andrea Madotto, Pascale Fung

Recently, fine-tuning pre-trained language models (e. g., multilingual BERT) to downstream cross-lingual tasks has shown promising results.

Continual Learning named-entity-recognition +5

Paper
Add Code

Meta-Transfer Learning for Code-Switched Speech Recognition

1 code implementation • ACL 2020 • Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, Peng Xu, Pascale Fung

An increasing number of people in the world today speak a mixed-language as a result of being multilingual.

Language Modelling speech-recognition +2

Paper
Code

Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection

1 code implementation • 28 Apr 2020 • Wenliang Dai, Tiezheng Yu, Zihan Liu, Pascale Fung

Nowadays, offensive content in social media has become a serious problem, and automatically detecting offensive language is an essential task.

Abuse Detection Language Modelling +1

Paper
Code

Coach: A Coarse-to-Fine Approach for Cross-domain Slot Filling

1 code implementation • ACL 2020 • Zihan Liu, Genta Indra Winata, Peng Xu, Pascale Fung

In this paper, we propose a Coarse-to-fine approach (Coach) for cross-domain slot filling.

Cross-Domain Named Entity Recognition named-entity-recognition +3

Paper
Code

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zhaojiang Lin, Andrea Madotto, Pascale Fung

Fine-tuning pre-trained generative language models to down-stream language generation tasks has shown promising results.

Language Modelling Text Generation +1

Paper
Code

Variational Transformers for Diverse Response Generation

2 code implementations • 28 Mar 2020 • Zhaojiang Lin, Genta Indra Winata, Peng Xu, Zihan Liu, Pascale Fung

Despite the great promise of Transformers in many sequence modeling tasks (e. g., machine translation), their deterministic nature hinders them from generalizing to high entropy tasks such as dialogue response generation.

Machine Translation Response Generation +1

Paper
Code

XPersona: Evaluating Multilingual Personalized Chatbot

1 code implementation • EMNLP (NLP4ConvAI) 2021 • Zhaojiang Lin, Zihan Liu, Genta Indra Winata, Samuel Cahyawijaya, Andrea Madotto, Yejin Bang, Etsuko Ishii, Pascale Fung

Experimental results show that the multilingual trained models outperform the translation-pipeline and that they are on par with the monolingual models, with the advantage of having a single model across multiple languages.

Chatbot Translation

Paper
Code

Learning Fast Adaptation on Cross-Accented Speech Recognition

1 code implementation • 4 Mar 2020 • Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung

The great variability and complex characteristics of accents creates a major challenge for training a robust and accent-agnostic automatic speech recognition (ASR) system.

Audio and Speech Processing Sound

Paper
Code

Zero-Resource Cross-Domain Named Entity Recognition

1 code implementation • WS 2020 • Zihan Liu, Genta Indra Winata, Pascale Fung

Existing models for cross-domain named entity recognition (NER) rely on numerous unlabeled corpus or labeled NER training data in target domains.

Ranked #1 on Cross-Domain Named Entity Recognition on CoNLL04

Cross-Domain Named Entity Recognition Domain Adaptation +4

Paper
Code

On the Importance of Word Order Information in Cross-lingual Sequence Labeling

no code implementations • 30 Jan 2020 • Zihan Liu, Genta Indra Winata, Samuel Cahyawijaya, Andrea Madotto, Zhaojiang Lin, Pascale Fung

To verify this hypothesis, we investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

Attention over Parameters for Dialogue Systems

no code implementations • 7 Jan 2020 • Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, Jamin Shin, Pascale Fung

Dialogue systems require a great deal of different but complementary expertise to assist, inform, and entertain humans.

Goal-Oriented Dialogue Systems

Paper
Add Code

Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems

1 code implementation • 21 Nov 2019 • Zihan Liu, Genta Indra Winata, Zhaojiang Lin, Peng Xu, Pascale Fung

Recently, data-driven task-oriented dialogue systems have achieved promising performance in English.

Dialogue State Tracking Intent Detection +4

Paper
Code

Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables

no code implementations • IJCNLP 2019 • Zihan Liu, Jamin Shin, Yan Xu, Genta Indra Winata, Peng Xu, Andrea Madotto, Pascale Fung

Despite the surging demands for multilingual task-oriented dialog systems (e. g., Alexa, Google Home), there has been less research done in multilingual or cross-lingual scenarios.

Intent Detection Natural Language Understanding +2

Paper
Add Code

Generalizing Question Answering System with Pre-trained Language Model Fine-tuning

no code implementations • WS 2019 • Dan Su, Yan Xu, Genta Indra Winata, Peng Xu, Hyeondey Kim, Zihan Liu, Pascale Fung

With a large number of datasets being released and new techniques being proposed, Question answering (QA) systems have witnessed great breakthroughs in reading comprehension (RC)tasks.

Language Modelling Multi-Task Learning +2

Paper
Add Code

Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer

no code implementations • 30 Oct 2019 • Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, Pascale Fung

Highly performing deep neural networks come at the cost of computational complexity that limits their practicality for deployment on portable devices.

Language Modelling speech-recognition +1

Paper
Add Code

Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition

1 code implementation • IJCNLP 2019 • Genta Indra Winata, Zhaojiang Lin, Jamin Shin, Zihan Liu, Pascale Fung

In countries that speak multiple main languages, mixing up different languages within a conversation is commonly called code-switching.

Ranked #1 on Named Entity Recognition (NER) on Code-Switching English-Spanish NER

named-entity-recognition Named Entity Recognition +2

Paper
Code

Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences

no code implementations • CONLL 2019 • Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, Pascale Fung

Training code-switched language models is difficult due to lack of data and complexity in the grammatical structure.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement Learning

1 code implementation • IJCNLP 2019 • Peng Xu, Chien-Sheng Wu, Andrea Madotto, Pascale Fung

Sensational headlines are headlines that capture people's attention and generate reader interest.

Headline Generation reinforcement-learning +1

Paper
Code

On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression

no code implementations • 27 Aug 2019 • Genta Indra Winata, Andrea Madotto, Jamin Shin, Elham J. Barezi, Pascale Fung

Despite their ubiquity in NLP tasks, Long Short-Term Memory (LSTM) networks suffer from computational inefficiencies caused by inherent unparallelizable recurrences, which further aggravates as LSTMs require more parameters for larger memory capacity.

Model Compression

Paper
Add Code

MoEL: Mixture of Empathetic Listeners

5 code implementations • IJCNLP 2019 • Zhaojiang Lin, Andrea Madotto, Jamin Shin, Peng Xu, Pascale Fung

Previous research on empathetic dialogue systems has mostly focused on generating responses given certain emotions.

Paper
Code

Incorporating Word and Subword Units in Unsupervised Machine Translation Using Language Model Rescoring

no code implementations • WS 2019 • Zihan Liu, Yan Xu, Genta Indra Winata, Pascale Fung

This paper describes CAiRE's submission to the unsupervised machine translation track of the WMT'19 news shared task from German to Czech.

Language Modelling NMT +2

Paper
Add Code

Getting To Know You: User Attribute Extraction from Dialogues

1 code implementation • LREC 2020 • Chien-Sheng Wu, Andrea Madotto, Zhaojiang Lin, Peng Xu, Pascale Fung

User attributes provide rich and useful information for user understanding, yet structured and easy-to-use attributes are often sparsely populated.

Attribute Attribute Extraction +1

Paper
Code

Learning to Learn Sales Prediction with Social Media Sentiment

no code implementations • WS 2019 • Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Zihan Liu, Yan Xu, Cong Gao, Pascale Fung

Paper
Add Code

Understanding the Shades of Sexism in Popular TV Series

no code implementations • WS 2019 • Nayeon Lee, Yejin Bang, Jamin Shin, Pascale Fung

[Multiple-submission] In the midst of a generation widely exposed to and influenced by media entertainment, the NLP research community has shown relatively little attention on the sexist comments in popular TV series.

valid

Paper
Add Code

Learning Multilingual Meta-Embeddings for Code-Switching Named Entity Recognition

no code implementations • WS 2019 • Genta Indra Winata, Zhaojiang Lin, Pascale Fung

In this paper, we propose Multilingual Meta-Embeddings (MME), an effective method to learn multilingual representations by leveraging monolingual pre-trained embeddings.

Language Identification named-entity-recognition +2

Paper
Add Code

Exploring Social Bias in Chatbots using Stereotype Knowledge

no code implementations • WS 2019 • Nayeon Lee, Andrea Madotto, Pascale Fung

Exploring social bias in chatbot is an important, yet relatively unexplored problem.

Chatbot

Paper
Add Code

CAiRE: An Empathetic Neural Chatbot

2 code implementations • 28 Jul 2019 • Zhaojiang Lin, Peng Xu, Genta Indra Winata, Farhad Bin Siddique, Zihan Liu, Jamin Shin, Pascale Fung

In this paper, we present an end-to-end empathetic conversation agent CAiRE.

Chatbot Empathetic Response Generation +2

Paper
Code

Generating Empathetic Responses by Looking Ahead the User's Sentiment

1 code implementation • 20 Jun 2019 • Jamin Shin, Peng Xu, Andrea Madotto, Pascale Fung

Hence, in this paper, we propose Sentiment Look-ahead, which is a novel perspective for empathy that models the future user emotional state.

Paper
Code

CAiRE_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue Emotion Classification

no code implementations • 10 Jun 2019 • Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Jamin Shin, Yan Xu, Peng Xu, Pascale Fung

Detecting emotion from dialogue is a challenge that has not yet been extensively surveyed.

Emotion Classification Gaussian Processes +1

Paper
Add Code

Team yeon-zi at SemEval-2019 Task 4: Hyperpartisan News Detection by De-noising Weakly-labeled Data

1 code implementation • SEMEVAL 2019 • Nayeon Lee, Zihan Liu, Pascale Fung

This paper describes our system that has been submitted to SemEval-2019 Task 4: Hyperpartisan News Detection.

Ensemble Learning

Paper
Code

CAiRE\_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue Emotion Classification

no code implementations • SEMEVAL 2019 • Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Jamin Shin, Yan Xu, Peng Xu, Pascale Fung

Detecting emotion from dialogue is a challenge that has not yet been extensively surveyed.

Emotion Classification Gaussian Processes +1

Paper
Add Code

A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems

no code implementations • NAACL 2019 • Elham J. Barezi, Ian D. Wood, Pascale Fung, Hamid R. Rabiee

We can then solve efficiently the problem of multi-label learning with an intractably large number of interdependent labels, such as automatic tagging of Wikipedia pages.

General Classification Multi-Label Learning

Paper
Add Code

Personalizing Dialogue Agents via Meta-Learning

1 code implementation • ACL 2019 • Zhaojiang Lin, Andrea Madotto, Chien-Sheng Wu, Pascale Fung

Existing personalized dialogue models use human designed persona descriptions to improve dialogue consistency.

Dialogue Generation Meta-Learning

128

Paper
Code

Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems

2 code implementations • ACL 2019 • Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, Pascale Fung

Over-dependence on domain ontology and lack of knowledge sharing across domains are two practical and yet less studied problems of dialogue state tracking.

Ranked #15 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.0

Dialogue State Tracking Multi-domain Dialogue State Tracking +2

393

Paper
Code

A novel repetition normalized adversarial reward for headline generation

no code implementations • 19 Feb 2019 • Peng Xu, Pascale Fung

While reinforcement learning can effectively improve language generation models, it often suffers from generating incoherent and repetitive phrases \cite{paulus2017deep}.

Headline Generation reinforcement-learning +1

Paper
Add Code

Towards Universal End-to-End Affect Recognition from Multilingual Speech by ConvNets

no code implementations • 19 Jan 2019 • Dario Bertero, Onno Kampman, Pascale Fung

It outperforms a similar CNN using spectrograms as input by 12. 8% for emotion and 6. 3% for personality, based on F-scores.

Paper
Add Code

Modality-based Factorization for Multimodal Fusion

no code implementations • WS 2019 • Elham J. Barezi, Peyman Momeni, Pascale Fung

We propose a novel method, Modality-based Redundancy Reduction Fusion (MRRF), for understanding and modulating the relative contribution of each modality in multimodal inference tasks.

Emotion Recognition Personality Trait Recognition +1

Paper
Add Code

GlobalTrait: Personality Alignment of Multilingual Word Embeddings

no code implementations • 1 Nov 2018 • Farhad Bin Siddique, Dario Bertero, Pascale Fung

We propose a multilingual model to recognize Big Five Personality traits from text data in four different languages: English, Spanish, Dutch and Italian.

Multilingual Word Embeddings Personality Alignment

Paper
Add Code

Towards End-to-end Automatic Code-Switching Speech Recognition

no code implementations • 30 Oct 2018 • Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, Pascale Fung

Speech recognition in mixed language has difficulties to adapt end-to-end framework due to the lack of data and overlapping phone sets, for example in words such as "one" in English and "w\`an" in Chinese.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Learning Comment Generation by Leveraging User-Generated Data

no code implementations • 29 Oct 2018 • Zhaojiang Lin, Genta Indra Winata, Pascale Fung

Existing models on open-domain comment generation are difficult to train, and they produce repetitive and uninteresting responses.

Comment Generation Information Retrieval +1

Paper
Add Code

Learn to Code-Switch: Data Augmentation using Copy Mechanism on Language Modeling

no code implementations • 24 Oct 2018 • Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, Pascale Fung

Building large-scale datasets for training code-switching language models is challenging and very expensive.

Data Augmentation Language Modelling +1

Paper
Add Code

Improving Large-Scale Fact-Checking using Decomposable Attention Models and Lexical Tagging

no code implementations • EMNLP 2018 • Nayeon Lee, Chien-Sheng Wu, Pascale Fung

Fact-checking of textual sources needs to effectively extract relevant information from large knowledge bases.

Fact Checking Question Answering +2

Paper
Add Code

Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training

1 code implementation • WS 2018 • Peng Xu, Andrea Madotto, Chien-Sheng Wu, Ji Ho Park, Pascale Fung

In this paper, we propose Emo2Vec which encodes emotional semantics into vectors.

Ranked #28 on Sentiment Analysis on SST-5 Fine-grained classification

Abusive Language Classification +4

Paper
Code

Reducing Gender Bias in Abusive Language Detection

no code implementations • EMNLP 2018 • Ji Ho Park, Jamin Shin, Pascale Fung

In this work, we measure gender biases on models trained with different abusive language datasets, while analyzing the effect of different pre-trained word embeddings and model architectures.

Abusive Language Data Augmentation +1

Paper
Add Code

Investigating Audio, Video, and Text Fusion Methods for End-to-End Automatic Personality Prediction

no code implementations • ACL 2018 • Onno Kampman, Elham J. Barezi, Dario Bertero, Pascale Fung

Furthermore, we can see the prediction relevance of each modality for each trait.

Emotional Intelligence

Paper
Add Code

PlusEmo2Vec at SemEval-2018 Task 1: Exploiting emotion knowledge from emoji and \#hashtags

no code implementations • SEMEVAL 2018 • Ji Ho Park, Peng Xu, Pascale Fung

This paper describes our system that has been submitted to SemEval-2018 Task 1: Affect in Tweets (AIT) to solve five subtasks.

BIG-bench Machine Learning Emotion Classification +4

Paper
Add Code

Attention-Based LSTM for Psychological Stress Detection from Spoken Language Using Distant Supervision

4 code implementations • 31 May 2018 • Genta Indra Winata, Onno Pepijn Kampman, Pascale Fung

The bidirectional LSTM model with attention is found to be the best model in terms of accuracy (74. 1%) and f-score (74. 3%).

106

Paper
Code

Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

no code implementations • WS 2018 • Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, Pascale Fung

Lack of text data has been the major issue on code-switching language modeling.

Language Modelling Multi-Task Learning +2

Paper
Add Code

Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition

no code implementations • WS 2018 • Genta Indra Winata, Chien-Sheng Wu, Andrea Madotto, Pascale Fung

We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

no code implementations • 2 May 2018 • Onno Kampman, Elham J. Barezi, Dario Bertero, Pascale Fung

Furthermore, we can see the prediction relevance of each modality for each trait.

Emotional Intelligence

Paper
Add Code

Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems

1 code implementation • ACL 2018 • Andrea Madotto, Chien-Sheng Wu, Pascale Fung

End-to-end task-oriented dialog systems usually suffer from the challenge of incorporating knowledge bases.

Ranked #10 on Task-Oriented Dialogue Systems on KVRET

Task-Oriented Dialogue Systems

353

Paper
Code

PlusEmo2Vec at SemEval-2018 Task 1: Exploiting emotion knowledge from emoji and #hashtags

no code implementations • SEMEVAL 2018 • Ji Ho Park, Peng Xu, Pascale Fung

This paper describes our system that has been submitted to SemEval-2018 Task 1: Affect in Tweets (AIT) to solve five subtasks.

BIG-bench Machine Learning regression +1

Paper
Add Code

Cross-domain Dialogue Policy Transfer via Simultaneous Speech-act and Slot Alignment

no code implementations • 20 Apr 2018 • Kaixiang Mo, Yu Zhang, Qiang Yang, Pascale Fung

Also, they depend on either common slots or slot entropy, which are not available when the source and target slots are totally disjoint and no database is available to calculate the slot entropy.

Paper
Add Code

Fine Grained Knowledge Transfer for Personalized Task-oriented Dialogue Systems

no code implementations • 11 Nov 2017 • Kaixiang Mo, Yu Zhang, Qiang Yang, Pascale Fung

Training a personalized dialogue system requires a lot of data, and the data collected for a single user is usually insufficient.

Sentence Task-Oriented Dialogue Systems +1

Paper
Add Code

Zara Returns: Improved Personality Induction and Adaptation by an Empathetic Virtual Agent

no code implementations • ACL 2017 • Farhad Bin Siddique, Onno Kampman, Yang Yang, Anik Dey, Pascale Fung

Word Embeddings

Paper
Add Code

One-step and Two-step Classification for Abusive Language Detection on Twitter

1 code implementation • WS 2017 • Ji Ho Park, Pascale Fung

Automatic abusive language detection is a difficult but important task for online social media.

Abuse Detection Abusive Language +4

Paper
Code

Zara: A Virtual Interactive Dialogue System Incorporating Emotion, Sentiment and Personality Recognition

no code implementations • COLING 2016 • Pascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Dario Bertero, Yan Wan, Ricky Ho Yin Chan, Chien-Sheng Wu

Zara, or {`}Zara the Supergirl{'} is a virtual robot, that can exhibit empathy while interacting with an user, with the aid of its built in facial and emotion recognition, sentiment analysis, and speech module.

Emotion Recognition Feature Engineering +3

Paper
Add Code

Real-Time Speech Emotion and Sentiment Recognition for Interactive Dialogue Systems

no code implementations • EMNLP 2016 • Dario Bertero, Farhad Bin Siddique, Chien-Sheng Wu, Yan Wan, Ricky Ho Yin Chan, Pascale Fung

Dialogue Management Emotion Recognition +3

Paper
Add Code

A Long Short-Term Memory Framework for Predicting Humor in Dialogues

no code implementations • NAACL 2016 • Dario Bertero, Pascale Fung

Paper
Add Code

Zara The Supergirl: An Empathetic Personality Recognition System

no code implementations • NAACL 2016 • Pascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Yan Wan, Ho Yin Ricky Chan

Emotion Recognition Sentiment Analysis +1

Paper
Add Code

Towards Empathetic Human-Robot Interactions

no code implementations • 13 May 2016 • Pascale Fung, Dario Bertero, Yan Wan, Anik Dey, Ricky Ho Yin Chan, Farhad Bin Siddique, Yang Yang, Chien-Sheng Wu, Ruixi Lin

Although research on empathetic robots is still in the early stage, we described our approach using signal processing techniques, sentiment analysis and machine learning algorithms to make robots that can "understand" human emotion.

Sentiment Analysis

Paper
Add Code

Deep Learning of Audio and Language Features for Humor Prediction

no code implementations • LREC 2016 • Dario Bertero, Pascale Fung

Our work is a starting point to developing more effective machine learning and neural network models on the humor prediction task, as well as developing machines capable in understanding humor in general.

BIG-bench Machine Learning

Paper
Add Code

A Machine Learning based Music Retrieval and Recommendation System

no code implementations • LREC 2016 • Naziba Mostafa, Yan Wan, Unnayan Amitabh, Pascale Fung

In this paper, we present a music retrieval and recommendation system using machine learning techniques.

BIG-bench Machine Learning Retrieval

Paper
Add Code

HLTC-HKUST: A Neural Network Paraphrase Classifier using Translation Metrics, Semantic Roles and Lexical Similarity Features

no code implementations • SEMEVAL 2015 • Dario Bertero, Pascale Fung

Paraphrase Identification Translation

Paper
Add Code

Language Modeling with Functional Head Constraint for Code Switching Speech Recognition

no code implementations • EMNLP 2014 • Ying Li, Pascale Fung

Language Identification Language Modelling +2

Paper
Add Code

Overview for the First Shared Task on Language Identification in Code-Switched Data

no code implementations • WS 2014 • Thamar Solorio, Elizabeth Blair, Suraj Maharjan, Steven Bethard, Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad AlGhamdi, Julia Hirschberg, Alison Chang, Pascale Fung

Language Identification

Paper
Add Code

A Hindi-English Code-Switching Corpus

no code implementations • LREC 2014 • Anik Dey, Pascale Fung

The aim of this paper is to investigate the rules and constraints of code-switching (CS) in Hindi-English mixed language data.

Paper
Add Code

Co-Training for Classification of Live or Studio Music Recordings

no code implementations • LREC 2014 • Nicolas Auguin, Pascale Fung

The fast-spreading development of online streaming services has enabled people from all over the world to listen to music.

Classification General Classification +2

Paper
Add Code

Code-Switch Language Model with Inversion Constraints for Mixed Language Speech Recognition

no code implementations • COLING 2012 • Ying Li, Pascale Fung

Language Modelling speech-recognition +1

Paper
Add Code

Using English Acoustic Models for Hindi Automatic Speech Recognition

no code implementations • WS 2012 • Anik Dey, Ying Li, Pascale Fung

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cross-Lingual Language Modeling with Syntactic Reordering for Low-Resource Speech Recognition

no code implementations • EMNLP 2012 • Ping Xu, Pascale Fung

Language Modelling speech-recognition +1

Paper
Add Code

A Mandarin-English Code-Switching Corpus

no code implementations • LREC 2012 • Ying Li, Yue Yu, Pascale Fung

Generally the existing monolingual corpora are not suitable for large vocabulary continuous speech recognition (LVCSR) of code-switching speech.

Boundary Detection Language Identification +4

Paper
Add Code

A Multilingual Natural Stress Emotion Database

no code implementations • LREC 2012 • Xin Zuo, Tian Li, Pascale Fung

In this paper, we describe an ongoing effort in collecting and annotating a multilingual speech database of natural stress emotion from university students.

Emotion Recognition Speech Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.