no code implementations • 1 Dec 2017 • Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze
We are working on a corpus of "how-to" videos from the web, and the idea is that an object that can be seen ("car"), or a scene that is being detected ("kitchen") can be used to condition both models on the "context" of the recording, thereby reducing perplexity and improving transcription.
no code implementations • NAACL 2018 • Seungwhan Moon, Leonardo Neves, Vitor Carvalho
We introduce a new task called Multimodal Named Entity Recognition (MNER) for noisy user-generated data such as tweets or Snapchat captions, which comprise short text with accompanying images.
no code implementations • ACL 2018 • Seungwhan Moon, Leonardo Neves, Vitor Carvalho
We introduce the new Multimodal Named Entity Disambiguation (MNED) task for multimodal social media posts such as Snapchat or Instagram captions, which are composed of short captions with accompanying images.
no code implementations • ACL 2018 • Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, Heng Ji
Everyday billions of multimodal posts containing both images and text are shared in social media sites such as Snapchat, Twitter or Instagram.
no code implementations • NAACL 2019 • Lahari Poddar, Leonardo Neves, William Brendel, Luis Marujo, Sergey Tulyakov, Pradeep Karuturi
Leveraging the assumption that learning the topic of a bug is a sub-task for detecting duplicates, we design a loss function that can jointly perform both tasks but needs supervision for only duplicate classification, achieving topic clustering in an unsupervised fashion.
2 code implementations • 5 Sep 2019 • Wenxuan Zhou, Hongtao Lin, Bill Yuchen Lin, Ziqi Wang, Junyi Du, Leonardo Neves, Xiang Ren
The soft matching module learns to match rules with semantically similar sentences such that raw corpora can be automatically labeled and leveraged by the RE module (in a much better coverage) as augmented supervision, in addition to the exactly matched sentences.
1 code implementation • ICLR 2020 • Ziqi Wang, Yujia Qin, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, Xiang Ren
While deep neural networks have achieved impressive performance on a range of NLP tasks, these data-hungry models heavily rely on labeled data, which restricts their applications in scenarios where data annotation is expensive.
no code implementations • ACL 2020 • Dong-Ho Lee, Rahul Khanna, Bill Yuchen Lin, Jamin Chen, Seyeon Lee, Qinyuan Ye, Elizabeth Boschee, Leonardo Neves, Xiang Ren
Successfully training a deep neural network demands a huge corpus of labeled data.
2 code implementations • 11 Jun 2020 • Tong Zhao, Yozen Liu, Leonardo Neves, Oliver Woodford, Meng Jiang, Neil Shah
Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction.
Ranked #1 on Node Classification on Flickr
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, Luis Espinosa-Anke
The experimental landscape in natural language processing for social media is too fragmented.
Ranked #3 on Sentiment Analysis on TweetEval
1 code implementation • WNUT (ACL) 2021 • Shuguang Chen, Gustavo Aguilar, Leonardo Neves, Thamar Solorio
Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context.
no code implementations • NAACL 2021 • Xisen Jin, Francesco Barbieri, Brendan Kennedy, Aida Mostafazadeh Davani, Leonardo Neves, Xiang Ren
Fine-tuned language models have been shown to exhibit biases against protected groups in a host of modeling tasks such as text classification and coreference resolution.
1 code implementation • COLING 2020 • Brihi Joshi, Neil Shah, Francesco Barbieri, Leonardo Neves
Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks such as question answering, sentiment analysis, and textual similarity in recent years.
no code implementations • 1 Jan 2021 • Xisen Jin, Francesco Barbieri, Leonardo Neves, Xiang Ren
Prediction bias in machine learning models, referring to undesirable model behaviors that discriminates inputs mentioning or produced by certain group, has drawn increasing attention from the research community given its societal impact.
1 code implementation • NAACL (SocialNLP) 2021 • Shuguang Chen, Leonardo Neves, Thamar Solorio
Performance of neural models for named entity recognition degrades over time, becoming stale.
1 code implementation • EMNLP 2021 • Shuguang Chen, Gustavo Aguilar, Leonardo Neves, Thamar Solorio
Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models.
2 code implementations • ACL 2022 • Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, Jose Camacho-Collados
Despite its importance, the time variable has been largely neglected in the NLP and language model literature.
1 code implementation • 29 Jun 2022 • Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Martínez-Cámara, Gonzalo Medina, Thomas Buhrmann, Leonardo Neves, Francesco Barbieri
In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media.
1 code implementation • COLING 2022 • Daniel Loureiro, Aminette D'Souza, Areej Nasser Muhajab, Isabella A. White, Gabriel Wong, Luis Espinosa Anke, Leonardo Neves, Francesco Barbieri, Jose Camacho-Collados
To bridge this gap, we present TempoWiC, a new benchmark especially aimed at accelerating research in social media-based meaning shift.
no code implementations • COLING 2022 • Dimosthenis Antypas, Asahi Ushio, Jose Camacho-Collados, Leonardo Neves, Vítor Silva, Francesco Barbieri
Social media platforms host discussions about a wide variety of topics that arise everyday.
no code implementations • 3 Oct 2022 • Jiaxin Pei, Vítor Silva, Maarten Bos, Yozon Liu, Leonardo Neves, David Jurgens, Francesco Barbieri
We propose MINT, a new Multilingual INTimacy analysis dataset covering 13, 372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic.
1 code implementation • 7 Oct 2022 • Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER).
1 code implementation • 14 Oct 2022 • Shuguang Chen, Leonardo Neves, Thamar Solorio
In this work, we take the named entity recognition task in the English language as a case study and explore style transfer as a data augmentation method to increase the size and diversity of training data in low-resource scenarios.
no code implementations • 4 Aug 2023 • Daniel Loureiro, Kiamehr Rezaee, Talayeh Riahi, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, Jose Camacho-Collados
This paper introduces a large collection of time series data derived from Twitter, postprocessed using word embedding techniques, as well as specialized fine-tuned language models.
no code implementations • 16 Sep 2023 • Shuguang Chen, Leonardo Neves, Thamar Solorio
In recent years, large pre-trained language models (PLMs) have achieved remarkable performance on many natural language processing benchmarks.
no code implementations • 23 Oct 2023 • Dimosthenis Antypas, Asahi Ushio, Francesco Barbieri, Leonardo Neves, Kiamehr Rezaee, Luis Espinosa-Anke, Jiaxin Pei, Jose Camacho-Collados
Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks.
no code implementations • 19 Dec 2023 • Qixiang Fang, Zhihan Zhou, Francesco Barbieri, Yozen Liu, Leonardo Neves, Dong Nguyen, Daniel L. Oberski, Maarten W. Bos, Ron Dotsch
Using this new framework, we design a Transformer-based user model that can produce high-quality general-purpose user representations for instant messaging platforms like Snapchat.
no code implementations • 20 Mar 2024 • Zhihan Zhou, Qixiang Fang, Leonardo Neves, Francesco Barbieri, Yozen Liu, Han Liu, Maarten W. Bos, Ron Dotsch
Furthermore, we introduce a novel training objective named future W-behavior prediction to transcend the limitations of next-token prediction by forecasting a broader horizon of upcoming user behaviors.