Improved Word Embeddings with Implicit Structure Information
Distributed word representation is an efficient method for capturing semantic and syntactic word relations. In this work, we introduce an extension to the continuous bag-of-words model for learning word representations efficiently by using implicit structure information. Instead of relying on a syntactic parser which might be noisy and slow to build, we compute weights representing probabilities of syntactic relations based on the Huffman softmax tree in an efficient heuristic. The constructed {``}implicit graphs{''} from these weights show that these weights contain useful implicit structure information. Extensive experiments performed on several word similarity and word analogy tasks show gains compared to the basic continuous bag-of-words model.
PDF Abstract COLING 2016 PDF COLING 2016 Abstract