Encoding Category Trees Into Word-Embeddings Using Geometric Approach
We present a novel method to implicitly encode a tree-structured category information into word-embeddings, resulting in super-dimensional ball representations ($n$-ball embedding for short). Inclusion relations among $n$-balls precisely encode subordinate relations among categories. The cosine similarity function is enriched by category information. A large $n$-ball dataset is constructed using geometrical method, which achieves zero energy cost in embedding tree structures into word embedding. A new benchmark dataset is created for predicting the category of unknown words. Experiments show that $n$-ball embeddings, carried with category information, significantly out-perform word-embeddings in the neighbourhood test, while only slightly change the original word-embeddings. Experiment results also show that $n$-ball embeddings demonstrate surprisingly good performance in validating the category of unknown word. Source codes and data-sets are free for public access \url{https://github.com/gnodisnait/nball4tree.git} and \url{https://github.com/gnodisnait/bp94nball.git}.
PDF Abstract