no code implementations • 29 Nov 2024 • Dimosthenis Antypas, Indira Sen, Carla Perez-Almendros, Jose Camacho-Collados, Francesco Barbieri
The detection of sensitive content in large datasets is crucial for ensuring that shared and analysed data is free from harmful material.
no code implementations • 4 Oct 2024 • Dimosthenis Antypas, Asahi Ushio, Francesco Barbieri, Jose Camacho-Collados
In the dynamic realm of social media, diverse topics are discussed daily, transcending linguistic boundaries.
1 code implementation • 6 Jun 2024 • Agostina Calabrese, Leonardo Neves, Neil Shah, Maarten W. Bos, Björn Ross, Mirella Lapata, Francesco Barbieri
Content moderators play a key role in keeping the conversation on social media healthy.
1 code implementation • 20 Mar 2024 • Zhihan Zhou, Qixiang Fang, Leonardo Neves, Francesco Barbieri, Yozen Liu, Han Liu, Maarten W. Bos, Ron Dotsch
Furthermore, we introduce a novel training objective named future W-behavior prediction to transcend the limitations of next-token prediction by forecasting a broader horizon of upcoming user behaviors.
no code implementations • 27 Feb 2024 • Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang
Using this pipeline, we collect LoCoMo, a dataset of very long-term conversations, each encompassing 300 turns and 9K tokens on avg., over up to 35 sessions.
no code implementations • 19 Dec 2023 • Qixiang Fang, Zhihan Zhou, Francesco Barbieri, Yozen Liu, Leonardo Neves, Dong Nguyen, Daniel L. Oberski, Maarten W. Bos, Ron Dotsch
Learning general-purpose user representations based on user behavioral logs is an increasingly popular user modeling approach.
1 code implementation • 15 Nov 2023 • Zhihan Zhang, Dong-Ho Lee, Yuwei Fang, Wenhao Yu, Mengzhao Jia, Meng Jiang, Francesco Barbieri
Instruction tuning has remarkably advanced large language models (LLMs) in understanding and responding to diverse human instructions.
no code implementations • 23 Oct 2023 • Dimosthenis Antypas, Asahi Ushio, Francesco Barbieri, Leonardo Neves, Kiamehr Rezaee, Luis Espinosa-Anke, Jiaxin Pei, Jose Camacho-Collados
Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks.
no code implementations • 23 Oct 2023 • Heinrich Peters, Yozen Liu, Francesco Barbieri, Raiyan Abdul Baten, Sandra C. Matz, Maarten W. Bos
The success of online social platforms hinges on their ability to predict and understand user behavior at scale.
no code implementations • 4 Aug 2023 • Daniel Loureiro, Kiamehr Rezaee, Talayeh Riahi, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, Jose Camacho-Collados
This paper introduces a large collection of time series data derived from Twitter, postprocessed using word embedding techniques, as well as specialized fine-tuned language models.
1 code implementation • 7 Oct 2022 • Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER).
no code implementations • 3 Oct 2022 • Jiaxin Pei, Vítor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens, Francesco Barbieri
We propose MINT, a new Multilingual INTimacy analysis dataset covering 13, 372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic.
no code implementations • COLING 2022 • Dimosthenis Antypas, Asahi Ushio, Jose Camacho-Collados, Leonardo Neves, Vítor Silva, Francesco Barbieri
Social media platforms host discussions about a wide variety of topics that arise everyday.
1 code implementation • COLING 2022 • Daniel Loureiro, Aminette D'Souza, Areej Nasser Muhajab, Isabella A. White, Gabriel Wong, Luis Espinosa Anke, Leonardo Neves, Francesco Barbieri, Jose Camacho-Collados
To bridge this gap, we present TempoWiC, a new benchmark especially aimed at accelerating research in social media-based meaning shift.
1 code implementation • 29 Jun 2022 • Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Martínez-Cámara, Gonzalo Medina, Thomas Buhrmann, Leonardo Neves, Francesco Barbieri
In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media.
1 code implementation • CVPR 2022 • Ligong Han, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov
In addition, our model can extract visual information as suggested by the text prompt, e. g., "an object in image one is moving northeast", and generate corresponding videos.
2 code implementations • ACL 2022 • Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, Jose Camacho-Collados
Despite its importance, the time variable has been largely neglected in the NLP and language model literature.
1 code implementation • LREC 2022 • Francesco Barbieri, Luis Espinosa Anke, Jose Camacho-Collados
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention.
Ranked #2 on Sentiment Analysis on TweetEval
no code implementations • 1 Jan 2021 • Xisen Jin, Francesco Barbieri, Leonardo Neves, Xiang Ren
Prediction bias in machine learning models, referring to undesirable model behaviors that discriminates inputs mentioning or produced by certain group, has drawn increasing attention from the research community given its societal impact.
1 code implementation • COLING 2020 • Brihi Joshi, Neil Shah, Francesco Barbieri, Leonardo Neves
Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks such as question answering, sentiment analysis, and textual similarity in recent years.
no code implementations • NAACL 2021 • Xisen Jin, Francesco Barbieri, Brendan Kennedy, Aida Mostafazadeh Davani, Leonardo Neves, Xiang Ren
Fine-tuned language models have been shown to exhibit biases against protected groups in a host of modeling tasks such as text classification and coreference resolution.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, Luis Espinosa-Anke
The experimental landscape in natural language processing for social media is too fragmented.
Ranked #3 on Sentiment Analysis on TweetEval
1 code implementation • 17 May 2019 • Jose Camacho-Collados, Yerai Doval, Eugenio Martínez-Cámara, Luis Espinosa-Anke, Francesco Barbieri, Steven Schockaert
Cross-lingual embeddings represent the meaning of words from different languages in the same vector space.
no code implementations • EMNLP 2018 • Francesco Barbieri, Luis Espinosa-Anke, Jose Camacho-Collados, Steven Schockaert, Horacio Saggion
Human language has evolved towards newer forms of communication such as social media, where emojis (i. e., ideograms bearing a visual meaning) play a key role.
1 code implementation • SEMEVAL 2018 • Francesco Barbieri, Jose Camacho-Collados
Our analyses reveal that some stereotypes related to the skin color and gender seem to be reflected on the use of these modifiers.
no code implementations • SEMEVAL 2018 • Francesco Barbieri, Jose Camacho-Collados, Francesco Ronzano, Luis Espinosa-Anke, Miguel Ballesteros, Valerio Basile, Viviana Patti, Horacio Saggion
This paper describes the results of the first Shared Task on Multilingual Emoji Prediction, organized as part of SemEval 2018.
no code implementations • 2 May 2018 • Francesco Barbieri, Luis Marujo, Pradeep Karuturi, William Brendel, Horacio Saggion
The frequent use of Emojis on social media platforms has created a new form of multimodal social interaction.
1 code implementation • NAACL 2018 • Francesco Barbieri, Miguel Ballesteros, Francesco Ronzano, Horacio Saggion
Emojis are small images that are commonly included in social media text messages.
no code implementations • WS 2017 • Francesco Barbieri, Luis Espinosa-Anke, Miguel Ballesteros, Juan Soler-Company, Horacio Saggion
Videogame streaming platforms have become a paramount example of noisy user-generated text.
3 code implementations • EACL 2017 • Francesco Barbieri, Miguel Ballesteros, Horacio Saggion
Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message.
no code implementations • LREC 2016 • Francesco Barbieri, Francesco Ronzano, Horacio Saggion
Emojis allow us to describe objects, situations and even feelings with small images, providing a visual and quick way to communicate.
no code implementations • LREC 2014 • Francesco Barbieri, Horacio Saggion
We propose in this paper a new set of experiments to assess the relevance of the features included in our model.