In the perspective of a layer normalization (LN) position, the architecture of Transformers can be categorized into two types: Post-LN and Pre-LN.
Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, enabling high-quality 3D reconstruction and novel view synthesis from image observations.
Interpretable rationales for model predictions are crucial in practical applications.
Interpretable rationales for model predictions play a critical role in practical applications.
The proposed method, ALONE (all word embeddings from one), constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable.
Ranked #3 on Text Summarization on DUC 2004 Task 1
Data augmentation by mixing samples, such as Mixup, has widely been used typically for classification tasks.
Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation.
A remedy for this is to train an agent with real-time feedback from a human observer who immediately gives rewards for some actions.
As well as deriving PMI from mutual information, we derive this new measure from the Hilbert--Schmidt independence criterion (HSIC); thus, we call the new measure the pointwise HSIC (PHSIC).
We stochastically replace words with other words that are predicted by a bi-directional language model at the word positions.
This paper presents the first study aimed at capturing stylistic similarity between words in an unsupervised manner.
We propose a novel, data-driven, and stylistically consistent dialog response generation system.
In this paper, we propose the first comprehensive system that can handle unconstrained spoken language and is able to effectively resolve ambiguity in spoken instructions.
This study addresses the problem of identifying the meaning of unknown words or entities in a discourse with respect to the word embedding approaches used in neural language models.
In this paper we present a system that performs this task using a supervised binary classifier on top of a recurrent neural network to predict the probability that a given story ending is correct.