The results demonstrate that the position of emojis in texts is a good clue to boost the performance of emoji label prediction.
Previous studies on the timeline summarization (TLS) task ignored the information interaction between sentences and dates, and adopted pre-defined unlearnable representations for them.
This paper proposes a new evaluation framework, Story Oriented Dense video cAptioning evaluation framework (SODA), for measuring the performance of video story description systems.
Encoder-decoder models have been commonly used for many tasks such as machine translation and response generation.
Neural sequence-to-sequence (Seq2Seq) models and BERT have achieved substantial improvements in abstractive document summarization (ADS) without and with pre-training, respectively.
Discourse segmentation and sentence-level discourse parsing play important roles for various NLP tasks to consider textual coherence.
Sentence extractive summarization shortens a document by selecting sentences for a summary while preserving its important contents.
Ranked #4 on Extractive Text Summarization on CNN / Daily Mail
These models estimate word boundaries from a character sequence.
Ranked #2 on Thai Word Segmentation on BEST-2010
Character-aware neural language models can capture the relationship between words by exploiting character-level information and are particularly effective for languages with rich morphology.
Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG.
Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing overfitting caused by the sparsity in Knowledge Graph (KG) datasets.
This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing related knowledge of the entity.
Our model employs the lattice structure to handle segmentation alternatives and utilizes graph neural networks along with an attention mechanism to attentively extract multi-granularity representation from the lattice for complementing character representations.
Ranked #1 on Chinese Word Segmentation on CTB6 (using extra training data)
Pre-trained seq2seq models have achieved state-of-the-art results in the grammatical error correction task.
To promote and further develop RST-style discourse parsing models, we need a strong baseline that can be regarded as a reference for reporting reliable experimental results.
Ranked #1 on Discourse Parsing on Instructional-DT (Instr-DT)
In this article, we explain the recent advance of subsampling methods in knowledge graph embedding (KGE) starting from the original one used in word2vec.
To solve this problem, we theoretically analyzed NS loss to assist hyperparameter tuning and understand the better use of the NS loss in KGE learning.
Writing an ad text that attracts people and persuades them to click or act is essential for the success of search engine advertising.
On the other hand, properties of the NS loss function that are considered important for learning, such as the relationship between the noise distribution and the number of negative samples, have not been investigated theoretically.
In summary, our contributions are (1) a new dataset for numerical table-to-text generation using pairs of a table and a paragraph of a table description with richer inference from scientific papers, and (2) a table-to-text generation framework enriched with numerical reasoning.
In knowledge graph embedding, the theoretical relationship between the softmax cross-entropy and negative sampling loss functions has not been investigated.
Ranked #14 on Link Prediction on FB15k-237
Although there are many studies on neural language generation (NLG), few trials are put into the real world, especially in the advertising domain.
We then pre-train a neural RST parser with the obtained silver data and fine-tune it on the RST-DT.
Ranked #2 on Discourse Parsing on RST-DT (using extra training data)
The task of generating weather-forecast comments from meteorological simulations has the following requirements: (i) the changes in numerical values for various physical quantities need to be considered, (ii) the weather comments should be dependent on delivery time and area information, and (iii) the comments should provide useful information for users.
This work presents multi-modal deep SVDD (mSVDD) for one-class text classification.
We therefore propose a method for extracting interesting relationships between persons from natural language texts by focusing on their surprisingness.
Numerical tables are widely used to present experimental results in scientific papers.
We propose a simple and effective method for incorporating word clusters into the Continuous Bag-of-Words (CBOW) model.
We propose neural models that can normalize text by considering the similarities of word strings and sounds.
To obtain better discourse dependency trees, we need to improve the accuracy of RST trees at the upper parts of the structures.
Ranked #3 on Discourse Parsing on RST-DT
Sentence compression is the task of compressing a long sentence into a short one by deleting redundant words.
Ranked #1 on Sentence Compression on Google Dataset
The first one builds the optimal tree in terms of a dissimilarity score function that is defined for splitting a text span into smaller ones.
We present neural machine translation models for translating a sentence in a text by using a graph-based encoder which can consider coreference relations provided within the text explicitly.
Our injection method can also be used together with previous methods.
To incorporate the information of a discourse tree structure into the neural network-based summarizers, we propose a discourse-aware neural extractive summarizer which can explicitly take into account the discourse dependency tree structure of the source document.
This paper investigates the construction of a strong baseline based on general purpose sequence-to-sequence models for constituency parsing.
Ranked #16 on Constituency Parsing on Penn Treebank
To solve this problem, we propose a higher-order syntactic attention network (HiSAN) that can handle higher-order dependency features as an attention distribution on LSTM hidden states.
Ranked #3 on Sentence Compression on Google Dataset
The sequence-to-sequence (Seq2Seq) model has been successfully applied to machine translation (MT).