The goal of this kind of adversarial attack is to modify the input text to fool a classifier while maintaining the original meaning of the text.
In this paper, we introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data.
Emergent processes in complex systems such as cellular automata can perform computations of increasing complexity, and could possibly lead to artificial evolution.
One of the main goals of Artificial Life is to research the conditions for the emergence of life, not necessarily as it is, but as it could be.
In order to develop systems capable of modeling artificial life, we need to identify, which systems can produce complex behavior.
Online Continual Learning (OCL) studies learning over a continuous data stream without observing any single example more than once, a setting that is closer to the experience of humans and systems that must learn "on-the-wild".
An explanatory model for the emergence of evolvable units must display emerging structures that (1) preserve themselves in time (2) self-reproduce and (3) tolerate a certain amount of variation when reproducing.
In this paper, we focus on the problem of adapting word vector-based models to new textual data.
Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space.
Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance.
Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl.
This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings.
The Differential State Framework (DSF) is a simple and high-performing design that unifies previously introduced gated neural models.
With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal.
We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.
A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.
This paper explores a simple and efficient baseline for text classification.
Ranked #1 on Sentiment Analysis on Sogou News
We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples.
One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent.
In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent.
Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review).
Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models.
Ranked #4 on Question Answering on QASent
In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage.
Ranked #5 on Few-Shot Image Classification on ImageNet - 0-Shot
We propose a new benchmark corpus to be used for measuring progress in statistical language modeling.
Ranked #22 on Language Modelling on One Billion Word
Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories.
Ranked #10 on Zero-Shot Action Recognition on Kinetics
Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
We propose two novel model architectures for computing continuous vector representations of words from very large data sets.
There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994).