no code implementations • EMNLP 2020 • Hui Su, Xiaoyu Shen, Zhou Xiao, Zheng Zhang, Ernie Chang, Cheng Zhang, Cheng Niu, Jie zhou
In this work, we take a close look at the movie domain and present a large-scale high-quality corpus with fine-grained annotations in hope of pushing the limit of movie-domain chatbots.
no code implementations • ACL 2022 • Hui Su, Weiwei Shi, Xiaoyu Shen, Zhou Xiao, Tuo ji, Jiarui Fang, Jie zhou
Large-scale pretrained language models have achieved SOTA results on NLP tasks.
no code implementations • ECNLP (ACL) 2022 • Xiaoyu Shen, Gianni Barlacchi, Marco del Tredici, Weiwei Cheng, Adrià Gispert
To fill in this blank, here we study how to effectively incorporate semi-structured answer sources for PQA and focus on presenting answers in a natural, fluent sentence.
no code implementations • ECNLP (ACL) 2022 • Xiaoyu Shen, Gianni Barlacchi, Marco del Tredici, Weiwei Cheng, Bill Byrne, Adrià Gispert
In this paper, we build a benchmark with annotations for both evidence selection and answer generation covering 6 information sources.
1 code implementation • 27 Aug 2022 • Qingyu Zhang, Xiaoyu Shen, Ernie Chang, Jidong Ge, Pengke Chen
In this paper, we present mDIA, the first large-scale multilingual benchmark for dialogue generation across low- to high-resource languages.
no code implementations • 5 Aug 2022 • Xiaoyu Shen, Svitlana Vakulenko, Marco del Tredici, Gianni Barlacchi, Bill Byrne, Adrià De Gispert
Dense retrieval (DR) approaches based on powerful pre-trained language models (PLMs) achieved significant advances and have become a key component for modern open-domain question-answering systems.
no code implementations • 15 May 2022 • Dawei Zhu, Xiaoyu Shen, Michael A. Hedderich, Dietrich Klakow
However, labels from weak supervision can be rather noisy and the high capacity of DNNs makes them easy to overfit the noisy labels.
1 code implementation • NAACL 2022 • David Ifeoluwa Adelani, Jesujoba Oluwadara Alabi, Angela Fan, Julia Kreutzer, Xiaoyu Shen, Machel Reid, Dana Ruiter, Dietrich Klakow, Peter Nabende, Ernie Chang, Tajuddeen Gwadabe, Freshia Sackey, Bonaventure F. P. Dossou, Chris Chinenye Emezue, Colin Leong, Michael Beukman, Shamsuddeen Hassan Muhammad, Guyo Dub Jarso, Oreen Yousuf, Andre Niyongabo Rubungo, Gilles Hacheme, Eric Peter Wairagala, Muhammad Umair Nasir, Benjamin Ayoade Ajibade, Tunde Oluwaseyi Ajayi, Yvonne Wambui Gitau, Jade Abbott, Mohamed Ahmed, Millicent Ochieng, Anuoluwapo Aremu, Perez Ogayo, Jonathan Mukiibi, Fatoumata Ouoba Kabore, Godson Koffi Kalipe, Derguene Mbaye, Allahsera Auguste Tapo, Victoire Memdjokam Koagne, Edwin Munkoh-Buabeng, Valencia Wagner, Idris Abdulmumin, Ayodele Awokoya, Happy Buzaaba, Blessing Sibanda, Andiswa Bukula, Sam Manthalu
We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pre-training?
no code implementations • 11 Apr 2022 • Junyun Cui, Xiaoyu Shen, Feiping Nie, Zheng Wang, Jinglong Wang, Yulong Chen
In this paper, to address the current lack of comprehensive survey of existing LJP tasks, datasets, models and evaluations, (1) we analyze 31 LJP datasets in 6 languages, present their construction process and define a classification method of LJP with 3 different attributes; (2) we summarize 14 evaluation metrics under four categories for different outputs of LJP tasks; (3) we review 12 legal-domain pretrained models in 3 languages and highlight 3 major research directions for LJP; (4) we show the state-of-art results for 8 representative datasets from different court cases and discuss the open challenges.
no code implementations • NLP4ConvAI (ACL) 2022 • Marco del Tredici, Xiaoyu Shen, Gianni Barlacchi, Bill Byrne, Adrià De Gispert
In conversational QA, models have to leverage information in previous turns to answer upcoming questions.
no code implementations • 3 Mar 2022 • Xiaoyu Shen
Text generation aims to produce human-like natural language output for down-stream tasks.
1 code implementation • 28 Feb 2022 • Zhijing Jin, Abhinav Lalwani, Tejas Vaidhya, Xiaoyu Shen, Yiwen Ding, Zhiheng Lyu, Mrinmaya Sachan, Rada Mihalcea, Bernhard Schölkopf
In this paper, we propose the task of logical fallacy detection, and provide a new dataset (Logic) of logical fallacies generally found in text, together with an additional challenge set for detecting logical fallacies in climate change claims (LogicClimate).
no code implementations • 16 Dec 2021 • Rongzhi Zhang, Yulong Gu, Xiaoyu Shen, Hui Su
We introduce time interval embedding to represent the time pattern between the item that needs to be predicted and historical click, and use it to replace the position embedding in the original transformer (called temporal transformer).
1 code implementation • 13 Dec 2021 • Yunyun huang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, Bin Luo
Given the fact of a case, Legal Judgment Prediction (LJP) involves a series of sub-tasks such as predicting violated law articles, charges and term of penalty.
no code implementations • 2 Dec 2021 • Ze Tang, Chuanyi Li, Jidong Ge, Xiaoyu Shen, Zheling Zhu, Bin Luo
Code summarization aims to generate brief natural language descriptions for source code.
1 code implementation • EMNLP 2021 • David Ifeoluwa Adelani, Miaoran Zhang, Xiaoyu Shen, Ali Davody, Thomas Kleinbauer, Dietrich Klakow
Documents as short as a single sentence may inadvertently reveal sensitive information about their authors, including e. g. their gender or ethnicity.
no code implementations • INLG (ACL) 2021 • Ernie Chang, Xiaoyu Shen, Alex Marin, Vera Demberg
We propose a shared task on training instance selection for few-shot neural text generation.
no code implementations • ACL 2021 • Ernie Chang, Xiaoyu Shen, Hui-Syuan Yeh, Vera Demberg
In this work, we present a study on training instance selection in few-shot neural text generation.
1 code implementation • 21 Apr 2021 • Jidong Ge, Yunyun huang, Xiaoyu Shen, Chuanyi Li, Wei Hu
We believe that learning fine-grained correspondence between each single fact and law articles is crucial for an accurate and trustworthy AI system.
no code implementations • EACL 2021 • Ernie Chang, Xiaoyu Shen, Dawei Zhu, Vera Demberg, Hui Su
Our approach automatically augments the data available for training by (i) generating new text samples based on replacing specific values by alternative ones from the same category, (ii) generating new text samples based on GPT-2, and (iii) proposing an automatic method for pairing the new text samples with data samples.
no code implementations • COLING 2020 • Binxia Xu, Siyuan Qiu, Jie Zhang, Yafang Wang, Xiaoyu Shen, Gerard de Melo
Utterance classification is a key component in many conversational systems.
2 code implementations • 13 Nov 2020 • Liqiang Wang, Xiaoyu Shen, Gerard de Melo, Gerhard Weikum
Prior work has focused on supervised learning with training data from the same domain.
no code implementations • COLING 2020 • Ernie Chang, Jeriah Caplinger, Alex Marin, Xiaoyu Shen, Vera Demberg
We present a lightweight annotation tool, the Data AnnotatoR Tool (DART), for the general task of labeling structured data with textual descriptions.
no code implementations • 22 Jul 2020 • Aditya Mogadala, Xiaoyu Shen, Dietrich Klakow
Particularly, these image features are subdivided into global and local features, where global features are extracted from the global representation of the image, while local features are extracted from the objects detected locally in an image.
1 code implementation • ACL 2020 • Hui Su, Xiaoyu Shen, Sanqiang Zhao, Xiao Zhou, Pengwei Hu, Randy Zhong, Cheng Niu, Jie zhou
Neural network-based sequence-to-sequence (seq2seq) models strongly suffer from the low-diversity problem when it comes to open-domain dialogue generation.
no code implementations • ACL 2020 • Xiaoyu Shen, Ernie Chang, Hui Su, Jie zhou, Dietrich Klakow
The neural attention model has achieved great success in data-to-text generation tasks.
no code implementations • 18 Mar 2020 • Ernie Chang, David Ifeoluwa Adelani, Xiaoyu Shen, Vera Demberg
In this work, we develop techniques targeted at bridging the gap between Pidgin English and English in the context of natural language generation.
no code implementations • IJCNLP 2019 • Xiaoyu Shen, Yang Zhao, Hui Su, Dietrich Klakow
Pointer Generators have been the de facto standard for modern summarization systems.
1 code implementation • IJCNLP 2019 • Xiaoyu Shen, Jun Suzuki, Kentaro Inui, Hui Su, Dietrich Klakow, Satoshi Sekine
As a result, the content to be described in the text cannot be explicitly controlled.
no code implementations • ACL 2019 • Yang Zhao, Xiaoyu Shen, Wei Bi, Akiko Aizawa
First, the word graph approach that simply concatenates fragments from multiple sentences may yield non-fluent or ungrammatical compression.
1 code implementation • ACL 2019 • Hui Su, Xiaoyu Shen, Rongzhi Zhang, Fei Sun, Pengwei Hu, Cheng Niu, Jie zhou
To properly train the utterance rewriter, we collect a new dataset with human annotations and introduce a Transformer-based utterance rewriting architecture using the pointer network.
no code implementations • EMNLP 2018 • Hui Su, Xiaoyu Shen, Wenjie Li, Dietrich Klakow
Sequence-to-Sequence (seq2seq) models have become overwhelmingly popular in building end-to-end trainable dialogue systems.
no code implementations • 6 Feb 2018 • Xiaoyu Shen, Hui Su, Shuzi Niu, Vera Demberg
Variational encoder-decoders (VEDs) have shown promising results in dialogue generation.
13 code implementations • IJCNLP 2017 • Yan-ran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, Shuzi Niu
We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects.
no code implementations • ACL 2017 • Xiaoyu Shen, Hui Su, Yan-ran Li, Wenjie Li, Shuzi Niu, Yang Zhao, Akiko Aizawa, Guoping Long
Deep latent variable models have been shown to facilitate the response generation for open-domain dialog systems.