no code implementations • • Lucia Specia, Zhenhao Li, Juan Pino, Vishrav Chaudhary, Francisco Guzmán, Graham Neubig, Nadir Durrani, Yonatan Belinkov, Philipp Koehn, Hassan Sajjad, Paul Michel, Xian Li
We report the findings of the second edition of the shared task on improving robustness in Machine Translation (MT).
Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models.
In this work, we study the fine-tuning method of the pretrained models for SED.
Ranked #1 on Sound Event Detection on DESED (using extra training data)
We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions.
We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations.
In order to tackle both clip-level and frame-level tasks, this paper proposes two self-supervised audio representation learning methods: ATST-Clip and ATST-Frame, responsible for learning clip-level and frame-level representations, respectively.
Information extraction, e. g., attribute value extraction, has been extensively studied and formulated based only on text.
We present a new task setting for attribute mining on e-commerce products, serving as a practical solution to extract open-world attributes without extensive human intervention.
Large and sparse feed-forward networks (S-FFN) such as Mixture-of-Experts (MoE) have demonstrated to be an efficient approach for scaling up Transformers model size for pretraining large language models.
In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples.
no code implementations • 22 Dec 2022 • Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, Todor Mihaylov, Daniel Simig, Ping Yu, Kurt Shuster, Tianlu Wang, Qing Liu, Punit Singh Koura, Xian Li, Brian O'Horo, Gabriel Pereyra, Jeff Wang, Christopher Dewan, Asli Celikyilmaz, Luke Zettlemoyer, Ves Stoyanov
To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks.
Ranked #15 on Natural Language Inference on RTE
Mutations in STK11/LKB1 and NF1 genes have been found in ADC and SQC and are often associated with drug resistance and poor prognosis, but STK11/NF1 co-mutation has not been reported and more cases are needed to reveal the association.
Recent work in multilingual translation advances translation quality surpassing bilingual baselines using deep transformer models with increased capacity.
Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next.
Deep face recognition has achieved great success due to large-scale training databases and rapidly developing loss functions.
Ranked #2 on Face Verification on CALFW
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.
7 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.
Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs
Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data.
Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.
Ranked #2 on Spoken Command Recognition on Speech Command v2
Two pre-training configurations for speech translation and recognition, respectively, are presented to alleviate subtask interference.
All-MLP architectures have attracted increasing interest as an alternative to attention-based models.
Ranked #1 on Zero-Shot Learning on HellaSwag
1 code implementation • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li
Large-scale generative language models such as GPT-3 are competitive few-shot learners.
no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.
In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.
Multilingual neural machine translation (MNMT) learns to translate multiple language pairs with a single model, potentially improving both the accuracy and the memory-efficiency of deployed models.
We present a simple yet effective approach to build multilingual speech-to-text (ST) translation through efficient transfer learning from a pretrained speech encoder and text decoder.
In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task.
Pretraining and multitask learning are widely used to improve the speech to text translation performance.
Procedural fairness has been a public concern, which leads to controversy when making decisions with respect to protected classes, such as race, social status, and disability.
We further propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequence modeling.
Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU?
Based on these insights, we propose an adaptive and sparse architecture for multilingual modeling, and train the model to learn shared and language-specific parameters to improve the positive transfer and mitigate the interference.
We show that common training method which upsamples low resources can not robustly optimize population loss with risks of either underfitting high resource languages or overfitting low resource ones.
The training of a deep face recognition system usually faces the interference of label noise in the training data.
The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training.
Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process.
We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder.
Fuzzing is one of the most effective technique to identify potential software vulnerabilities.