Quantum Statistics-Inspired Neural Attention

17 Sep 2018 · Aristotelis Charalampous, Sotirios Chatzis ·

Sequence-to-sequence (encoder-decoder) models with attention constitute a cornerstone of deep learning research, as they have enabled unprecedented sequential data modeling capabilities. This effectiveness largely stems from the capacity of these models to infer salient temporal dynamics over long horizons; these are encoded into the obtained neural attention (NA) distributions. However, existing NA formulations essentially constitute point-wise selection mechanisms over the observed source sequences; that is, attention weights computation relies on the assumption that each source sequence element is independent of the rest. Unfortunately, although convenient, this assumption fails to account for higher-order dependencies which might be prevalent in real-world data. This paper addresses these limitations by leveraging Quantum-Statistical modeling arguments. Specifically, our work broadens the notion of NA, by attempting to account for the case that the NA model becomes inherently incapable of discerning between individual source elements; this is assumed to be the case due to higher-order temporal dynamics. On the contrary, we postulate that in some cases selection may be feasible only at the level of pairs of source sequence elements. To this end, we cast NA into inference of an attention density matrix (ADM) approximation. We derive effective training and inference algorithms, and evaluate our approach in the context of a machine translation (MT) application. We perform experiments with challenging benchmark datasets. As we show, our approach yields favorable outcomes in terms of several evaluation metrics.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Machine Translation

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Quantum Statistics-Inspired Neural Attention

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove