DaSciM (Data Science and Mining) part of LIX at Ecole Polytechnique, established in 2013 and since then producing research results in the area of large scale data analysis via methods of machine and deep learning.
We make our model publicly available in the transformers library with the aim of promoting future research in analytic tasks for French tweets.
Adding that pretrained word vectors on huge text corpus achieved high performance in many different NLP tasks.
We employ an alignment-based approach to compare these embeddings with a general-purpose Twitter embedding unrelated to COVID-19.
Artificial Intelligence techniques are already popular and important in the legal domain.
A valuable by-product of our method is the ability to sample, at no extra cost, sentences containing different senses of a given word.
Word embeddings are undoubtedly very useful components in many NLP tasks.