Non-Linearity in Mapping Based Cross-Lingual Word Embeddings
Recent works on cross-lingual word embeddings have been mainly focused on linear-mapping-based approaches, where pre-trained word embeddings are mapped into a shared vector space using a linear transformation. However, there is a limitation in such approaches{--}they follow a key assumption: words with similar meanings share similar geometric arrangements between their monolingual word embeddings, which suggest that there is a linear relationship between languages. However, such assumption may not hold for all language pairs across all semantic concepts. We investigate whether non-linear mappings can better describe the relationship between different languages by utilising kernel Canonical Correlation Analysis (KCCA). Experimental results on five language pairs show an improvement over current state-of-art results in both supervised and self-learning scenarios, confirming that non-linear mapping is a better way to describe the relationship between languages.
PDF Abstract