In recent years, substantial advancements in pre-trained language models have paved the way for the development of numerous non-English language versions, with a particular focus on encoder-only and decoder-only architectures.
Existing question answering methods often assume that the input content (e. g., documents or videos) is always accessible to solve the task.
Lifelong language learning seeks to have models continuously learn multiple tasks in a sequential order without suffering from catastrophic forgetting.
Based on these insights, we propose CAWS (Consistency AWare Sampling), an original storage policy that leverages a learning consistency score (C-Score) to populate the memory with elements that are easy to learn and representative of previous tasks.
In this paper we present ALBETO and DistilBETO, which are versions of ALBERT and DistilBERT pre-trained exclusively on Spanish corpora.
Due to the success of pre-trained language models, versions of languages other than English have been released in recent years.
Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level.
The success of pretrained word embeddings has motivated their use in the biomedical domain, with contextualized embeddings yielding remarkable results in several biomedical NLP tasks.
Recently, few-shot learning has received increasing interest.
DACT-BERT adds an adaptive computation mechanism to the regular processing pipeline of BERT.
Our evaluation indicates that both the Transformer architecture and the contextual information are essential to get the best results for this item recommendation task.
The success of pre-trained word embeddings of the BERT model has motivated its use in tasks in the biomedical domain.
We propose a multi-head attention mechanism as a blending layer in a neural network model that translates natural language to a high level behavioral language for indoor robot navigation.
We also show that we can significantly improve the robustness of the models by training them with adversarial examples.
There has been significant progress in recent years in the field of Natural Language Processing thanks to the introduction of the Transformer architecture.