no code implementations • • Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou
In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks.
The Differentiable Search Index (DSI) is a new, emerging paradigm for information retrieval.
To combat this issue, we propose the Knowledge Mixture Data Augmentation Model (KnowDA): an encoder-decoder LM pretrained on a mixture of diverse NLP tasks using Knowledge Mixture Training (KoMT).
A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever.
On the multilingual sentence retrieval task Tatoeba, our model achieves new SOTA results among methods without using bilingual data.
The sparse Mixture-of-Experts (MoE) model is powerful for large-scale pre-training and has achieved promising results due to its model capacity.
As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e. g., search history, medical record, bank account).
The learn-to-compare paradigm of contrastive representation learning (CRL), which compares positive samples with negative ones for representation learning, has achieved great success in a wide range of domains, including natural language processing, computer vision, information retrieval and graph learning.
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Although spoken language understanding (SLU) has achieved great success in high-resource languages, such as English, it remains challenging in low-resource languages mainly due to the lack of high quality training data.
(2) How to cohere with context and preserve the knowledge when generating a stylized response.
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks (xSL), such as cross-lingual machine reading comprehension (xMRC) by transferring knowledge from a high-resource language to low-resource languages.
Knowledge graph (KG) based Collaborative Filtering is an effective approach to personalizing recommendation systems for relatively static domains such as movies and books, by leveraging structured information from KG to enrich both item and user representations.
The conversational recommender systems (CRSs) have received extensive attention in recent years.
To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework.
To address these challenges, we present HeterMPC, a heterogeneous graph-based neural network for response generation in MPCs which models the semantics of utterances and interlocutors simultaneously with two types of nodes in a graph.
Second, to prevent multi-view embeddings from collapsing to the same one, we further propose a global-local loss with annealed temperature to encourage the multiple viewers to better align with different potential queries.
Generating new events given context with correlated ones plays a crucial role in many event-centric reasoning tasks.
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks.
Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged.
A straightforward solution is resorting to more diverse positives from a multi-augmenting strategy, while an open question remains about how to unsupervisedly learn from the diverse positives but with uneven augmenting qualities in the text field.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
Cross-lingual Machine Reading Comprehension (xMRC) is challenging due to the lack of training data in low-resource languages.
To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.
Ranked #1 on Text-to-Video Generation on Kinetics
The sequence representation plays a key role in the learning of matching degree between the dialogue context and the response.
In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model.
In this paper, we present a pre-trained language model (PLM) based framework called RID for conversational recommender system (CRS).
Event correlation reasoning infers whether a natural language paragraph containing multiple events conforms to human common sense.
For this task, the adoption of pre-trained language models (such as BERT) has led to remarkable progress in a number of benchmarks.
Second, only the items mentioned in the training corpus have a chance to be recommended in the conversation.
We study the problem of coarse-grained response selection in retrieval-based dialogue systems.
Specifically, a posterior distribution over visual objects is inferred from both context (history and questions) and answers, and it ensures the appropriate grounding of visual objects during the training process.
In recent years, world business in online discussions and opinion sharing on social media is booming.
Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models.
We then sample token pairs based on their probability scores derived from the sketched attention matrix to generate different sparse attention index matrices for different attention heads.
Procedural text understanding aims at tracking the states (e. g., create, move, destroy) and locations of the entities mentioned in a given paragraph.
Sequence-to-Sequence (S2S) neural text generation models, especially the pre-trained ones (e. g., BART and T5), have exhibited compelling performance on various natural language generation tasks.
Faced with increased compute requirements and low resources for language expansion, we build a single universal model for improving the quality and reducing run-time costs of our production system.
Recently, various neural models for multi-party conversation (MPC) have achieved impressive improvements on a variety of tasks such as addressee recognition, speaker identification and response prediction.
That is, we can only access training data in a high-resource language, while need to answer multilingual questions without any labeled data in target languages.
Named entity recognition (NER) is a fundamental component in many applications, such as Web Search and Voice Assistants.
Finding codes given natural language query isb eneficial to the productivity of software developers.
The retriever aims to retrieve a correlated image to the dialog from an image index, while the visual concept detector extracts rich visual knowledge from the image.
Logical reasoning of text requires understanding critical logical information in the text and performing inference over them.
Ranked #4 on Reading Comprehension on ReClor
ProphetNet is a pre-training based natural language generation method which shows powerful performance on English text summarization and question generation tasks.
In this work, we conduct a thorough examination of pretrained model based unsupervised sentence embeddings.
Rule-based dialogue management is still the most popular solution for industrial task-oriented dialogue systems for their interpretablility.
3 code implementations • 9 Feb 2021 • Shuai Lu, Daya Guo, Shuo Ren, JunJie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu
Benchmark datasets have a significant impact on accelerating research in programming language tasks.
Ranked #1 on Cloze Test on CodeXGLUE - CT-maxmin
We notice that some real-world QA tasks are more complex, which cannot be solved by end-to-end neural networks or translated to any kind of formal representations.
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
When multiple teacher models are available in distillation, the state-of-the-art methods assign a fixed weight to a teacher model in the whole distillation.
2 code implementations • • Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan
Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP).
To tackle the challenge of lack of training data in low-resource languages, we dedicatedly develop a novel unsupervised phrase boundary recovery pre-training task to enhance the multilingual boundary detection capability of CalibreNet.
Then, we devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA).
We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect described in a background paragraph, and apply the knowledge to a novel situation.
In this work, we focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
The Natural Questions (NQ) benchmark set brings new challenges to Machine Reading Comprehension: the answers are not only at different levels of granularity (long and short), but also of richer types (including no-answer, yes/no, single-span and multi-span).
Specifically, a cascaded labeling module is developed to enhance the interchange between aspect terms and improve the attention of sentiment tokens when labeling sentiment polarities.
In a multi-turn knowledge-grounded dialog, the difference between the knowledge selected at different turns usually provides potential clues to knowledge selection, which has been largely neglected in previous research.
1 code implementation • • Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, Ming Zhou
Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
Ranked #1 on Type prediction on ManyTypes4TypeScript
To address these issues, in this paper, we propose learning a context-response matching model with auxiliary self-supervised tasks designed for the dialogue data based on pre-trained language models.
Ranked #2 on Conversational Response Selection on E-commerce
It is common for people to create different types of charts to explore a multi-dimensional dataset (table).
Generating inferential texts about an event in different perspectives requires reasoning over different contexts that the event occurs.
In this paper, we make the first study to explore the correlation between user behavior and passage relevance, and propose a novel approach for mining training data for Web QA.
In this paper, we formalize the music-conditioned dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance.
Natural Questions is a new challenging machine reading comprehension benchmark with two-grained answers, which are a long answer (typically a paragraph) and a short answer (one or more entities inside the long answer).
The representations are then fed into the predictor to obtain the span of the short answer, the paragraph of the long answer, and the answer type in a cascaded manner.
We study the detection of propagandistic text fragments in news articles.
Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages.
The graph is used to obtain graph-enhanced contextual representations of words in Transformer-based architecture.
In this work, we introduce a learning algorithm which directly optimizes model's ability to learn text representations for effective learning of downstream tasks.
Furthermore, we propose a simple and effective method to mine the keyphrases of interest in the news article and build a first large-scale keyphrase-aware news headline corpus, which contains over 180K aligned triples of $<$news article, headline, keyphrase$>$.
In this work, we use multiple knowledge sources as fuels for the model.
2 code implementations • 3 Apr 2020 • Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou
In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks.
Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT.
Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks.
Ranked #1 on Code Documentation Generation on CodeSearchNet - Go
We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa.
Ranked #1 on Entity Typing on Open Entity
The experiment results show that our method can significantly outperform the baseline methods and even achieve comparable results with the original teacher models, along with substantial speedup of model inference.
We consider the problem of conversational question answering over a large-scale knowledge base.
Neural semantic parsing has achieved impressive results in recent years, yet its success relies on the availability of large amounts of supervised data.
In this work, we propose to automatically extract evidence from heterogeneous knowledge sources, and answer questions based on the extracted evidence.
Ranked #7 on Common Sense Reasoning on CommonsenseQA
These two problems lead to a poorly-trained semantic parsing model.
On XNLI, 1. 8% averaged accuracy improvement (on 15 languages) is obtained.
We propose Unicoder-VL, a universal encoder that aims to learn joint representations of vision and language in a pre-training manner.
Ranked #2 on Text-Image Retrieval on COCO (image as query)
We develop a new paradigm for the task of joint entity relation extraction.
Ranked #1 on Relation Extraction on ACE 2005 (Sentence Encoder metric)
Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks.
Deep pre-training and fine-tuning models (like BERT, OpenAI GPT) have demonstrated excellent results in question answering areas.
We present assertion based question answering (ABQA), an open domain question answering task that takes a question and a passage as inputs, and outputs a semi-structured assertion consisting of a subject, a predicate and a list of arguments.