Instruction tuning has remarkably advanced large language models (LLMs) in understanding and responding to diverse human instructions.
Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks.
The success of online social platforms hinges on their ability to predict and understand user behavior at scale.
This paper introduces a large collection of time series data derived from Twitter, postprocessed using word embedding techniques, as well as specialized fine-tuned language models.
Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER).
We propose MINT, a new Multilingual INTimacy analysis dataset covering 13, 372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic.
Social media platforms host discussions about a wide variety of topics that arise everyday.
To bridge this gap, we present TempoWiC, a new benchmark especially aimed at accelerating research in social media-based meaning shift.
1 code implementation • 29 Jun 2022 • Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Martínez-Cámara, Gonzalo Medina, Thomas Buhrmann, Leonardo Neves, Francesco Barbieri
In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media.
In addition, our model can extract visual information as suggested by the text prompt, e. g., "an object in image one is moving northeast", and generate corresponding videos.
Despite its importance, the time variable has been largely neglected in the NLP and language model literature.
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention.
Ranked #2 on Sentiment Analysis on TweetEval
Prediction bias in machine learning models, referring to undesirable model behaviors that discriminates inputs mentioning or produced by certain group, has drawn increasing attention from the research community given its societal impact.
Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks such as question answering, sentiment analysis, and textual similarity in recent years.
Fine-tuned language models have been shown to exhibit biases against protected groups in a host of modeling tasks such as text classification and coreference resolution.
The experimental landscape in natural language processing for social media is too fragmented.
Ranked #3 on Sentiment Analysis on TweetEval
Cross-lingual embeddings represent the meaning of words from different languages in the same vector space.
Human language has evolved towards newer forms of communication such as social media, where emojis (i. e., ideograms bearing a visual meaning) play a key role.
This paper describes the results of the first Shared Task on Multilingual Emoji Prediction, organized as part of SemEval 2018.
The frequent use of Emojis on social media platforms has created a new form of multimodal social interaction.
Videogame streaming platforms have become a paramount example of noisy user-generated text.
Emojis allow us to describe objects, situations and even feelings with small images, providing a visual and quick way to communicate.