LREC 2018

SentEval: An Evaluation Toolkit for Universal Sentence Representations

LREC 2018 facebookresearch/InferSent

We introduce SentEval, a toolkit for evaluating the quality of universal sentence representations.

NATURAL LANGUAGE INFERENCE

BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages

LREC 2018 bheinzerling/bpemb

We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE).

ENTITY TYPING TOKENIZATION WORD EMBEDDINGS

Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions

LREC 2018 akanimax/T2F

To gain a better understanding of the variation we find in face description and the possible issues that this may raise, we also conducted an annotation study on a subset of the corpus.

Advances in Pre-Training Distributed Word Representations

LREC 2018 RaRe-Technologies/gensim-data

Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl.

NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System

LREC 2018 TellinaTool/nl2bash

We present new data and semantic parsing methods for the problem of mapping English sentences to Bash commands (NL2Bash).

SEMANTIC PARSING

Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

LREC 2018 Kyubyong/quasi-rnn

The goal of this work is to design a machine translation (MT) system for a low-resource family of dialects, collectively known as Swiss German, which are widely spoken in Switzerland but seldom written.

MACHINE TRANSLATION