Shape of Elephant: Study of Macro Properties of Word Embeddings Spaces

13 Jun 2021  ·  Alexey Tikhonov ·

Pre-trained word representations became a key component in many NLP tasks. However, the global geometry of the word embeddings remains poorly understood. In this paper, we demonstrate that a typical word embeddings cloud is shaped as a high-dimensional simplex with interpretable vertices and propose a simple yet effective method for enumeration of these vertices. We show that the proposed method can detect and describe vertices of the simplex for GloVe and fasttext spaces.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods