SBWCE (Spanish Billion Word Corpus and Embeddings)

This resource consists of an unannotated corpus of the Spanish language of nearly 1.5 billion words, compiled from different corpora and resources from the web; and a set of word vectors (or embeddings), created from this corpus using the word2vec algorithm, provided by the gensim package. These embeddings were evaluated by translating to Spanish word2vec’s word relation test set.

Source: SBWCE

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


License


  • Unknown

Modalities


Languages