Search Results for author: Thamar Solorio

Found 86 papers, 23 papers with code

Normalization and Back-Transliteration for Code-Switched Data

no code implementations NAACL (CALCS) 2021 Dwija Parikh, Thamar Solorio

Code-switching is an omnipresent phenomenon in multilingual communities all around the world but remains a challenge for NLP systems due to the lack of proper data and processing techniques.

named-entity-recognition Named Entity Recognition +3

Interpreting Themes from Educational Stories

2 code implementations8 Apr 2024 Yigeng Zhang, Fabio A. González, Thamar Solorio

Reading comprehension continues to be a crucial research focus in the NLP community.

Machine Reading Comprehension

NLP Progress in Indigenous Latin American Languages

no code implementations8 Apr 2024 Atnafu Lambebo Tonja, Fazlourrahman Balouchzahi, Sabur Butt, Olga Kolesnikova, Hector Ceballos, Alexander Gelbukh, Thamar Solorio

The paper focuses on the marginalization of indigenous language communities in the face of rapid technological advancements.

Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations

1 code implementation3 Apr 2024 Emilio Villa-Cueva, A. Pastor López-Monroy, Fernando Sánchez-Vega, Thamar Solorio

Zero-Shot Cross-lingual Transfer (ZS-XLT) utilizes a model trained in a source language to make predictions in another language, often with a performance loss.

text-classification Text Classification +1

Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering

no code implementations16 Feb 2024 David Romero, Thamar Solorio

We present Q-ViD, a simple approach for video question answering (video QA), that unlike prior methods, which are based on complex architectures, computationally expensive pipelines or use closed models like GPTs, Q-ViD relies on a single instruction-aware open vision-language model (InstructBLIP) to tackle videoQA using frame descriptions.

Language Modelling Large Language Model +2

Positive and Risky Message Assessment for Music Products

1 code implementation18 Sep 2023 Yigeng Zhang, Mahsa Shafaei, Fabio A. González, Thamar Solorio

In this work, we introduce a pioneering research challenge: evaluating positive and potentially harmful messages within music products.

Context-aware Adversarial Attack on Named Entity Recognition

no code implementations16 Sep 2023 Shuguang Chen, Leonardo Neves, Thamar Solorio

In recent years, large pre-trained language models (PLMs) have achieved remarkable performance on many natural language processing benchmarks.

Adversarial Attack named-entity-recognition +1

SafeWebUH at SemEval-2023 Task 11: Learning Annotator Disagreement in Derogatory Text: Comparison of Direct Training vs Aggregation

1 code implementation1 May 2023 Sadat Shahriar, Thamar Solorio

Subjectivity and difference of opinion are key social phenomena, and it is crucial to take these into account in the annotation and detection process of derogatory textual content.

Distillation of encoder-decoder transformers for sequence labelling

no code implementations10 Feb 2023 Marco Farina, Duccio Pappadopulo, Anant Gupta, Leslie Huang, Ozan İrsoy, Thamar Solorio

Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models.

Few-Shot Learning Hallucination

The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges

1 code implementation19 Dec 2022 Genta Indra Winata, Alham Fikri Aji, Zheng-Xin Yong, Thamar Solorio

Code-Switching, a common phenomenon in written text and conversation, has been studied over decades by the natural language processing (NLP) research community.

Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition

1 code implementation14 Oct 2022 Shuguang Chen, Leonardo Neves, Thamar Solorio

In this work, we take the named entity recognition task in the English language as a case study and explore style transfer as a data augmentation method to increase the size and diversity of training data in low-resource scenarios.

Data Augmentation named-entity-recognition +4

Survey of Aspect-based Sentiment Analysis Datasets

1 code implementation11 Apr 2022 Siva Uday Sampreeth Chebolu, Franck Dernoncourt, Nedim Lipka, Thamar Solorio

Aspect-based sentiment analysis (ABSA) is a natural language processing problem that requires analyzing user-generated reviews to determine: a) The target entity being reviewed, b) The high-level aspect to which it belongs, and c) The sentiment expressed toward the targets and the aspects.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA)

CALCS 2021 Shared Task: Machine Translation for Code-Switched Data

no code implementations19 Feb 2022 Shuguang Chen, Gustavo Aguilar, Anirudh Srinivasan, Mona Diab, Thamar Solorio

For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions.

Language Identification Machine Translation +3

Exploring Conditional Text Generation for Aspect-Based Sentiment Analysis

1 code implementation5 Oct 2021 Siva Uday Sampreeth Chebolu, Franck Dernoncourt, Nedim Lipka, Thamar Solorio

Aspect-based sentiment analysis (ABSA) is an NLP task that entails processing user-generated reviews to determine (i) the target being evaluated, (ii) the aspect category to which it belongs, and (iii) the sentiment expressed towards the target and aspect pair.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

From None to Severe: Predicting Severity in Movie Scripts

1 code implementation Findings (EMNLP) 2021 Yigeng Zhang, Mahsa Shafaei, Fabio Gonzalez, Thamar Solorio

In this paper, we introduce the task of predicting severity of age-restricted aspects of movie content based solely on the dialogue script.

White Paper - Creating a Repository of Objectionable Online Content: Addressing Undesirable Biases and Ethical Considerations

no code implementations23 Feb 2021 Thamar Solorio, Mahsa Shafaei, Christos Smailis, Isabelle Augenstein, Margaret Mitchell, Ingrid Stapf, Ioannis Kakadiaris

This white paper summarizes the authors' structured brainstorming regarding ethical considerations for creating an extensive repository of online content labeled with tags that describe potentially questionable content for young viewers.

A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers

no code implementations26 Jan 2021 Mahsa Shafaei, Christos Smailis, Ioannis A. Kakadiaris, Thamar Solorio

In this work, we explore different approaches to combine modalities for the problem of automated age-suitability rating of movie trailers.

White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content

no code implementations25 Jan 2021 Thamar Solorio, Mahsa Shafaei, Christos Smailis, Mona Diab, Theodore Giannakopoulos, Heng Ji, Yang Liu, Rada Mihalcea, Smaranda Muresan, Ioannis Kakadiaris

This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content.

Learning to Emphasize: Dataset and Shared Task Models for Selecting Emphasis in Presentation Slides

no code implementations2 Jan 2021 Amirreza Shirani, Giai Tran, Hieu Trinh, Franck Dernoncourt, Nedim Lipka, Paul Asente, Jose Echevarria, Thamar Solorio

We evaluate a range of state-of-the-art models on this novel dataset by organizing a shared task and inviting multiple researchers to model emphasis in this new domain.

Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality

no code implementations Findings (EMNLP) 2021 Gustavo Aguilar, Bryan McCann, Tong Niu, Nazneen Rajani, Nitish Keskar, Thamar Solorio

To alleviate these challenges, we propose a character-based subword module (char2subword) that learns the subword embedding table in pre-trained models like BERT.

SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media

no code implementations SEMEVAL 2020 Amirreza Shirani, Franck Dernoncourt, Nedim Lipka, Paul Asente, Jose Echevarria, Thamar Solorio

In this paper, we present the main findings and compare the results of SemEval-2020 Task 10, Emphasis Selection for Written Text in Visual Media.


LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation

no code implementations LREC 2020 Gustavo Aguilar, Sudipta Kar, Thamar Solorio

To facilitate research in this direction, we propose a centralized benchmark for Linguistic Code-switching Evaluation (LinCE) that combines ten corpora covering four different code-switched language pairs (i. e., Spanish-English, Nepali-English, Hindi-English, and Modern Standard Arabic-Egyptian Arabic) and four tasks (i. e., language identification, named entity recognition, part-of-speech tagging, and sentiment analysis).

Language Identification named-entity-recognition +4

Let Me Choose: From Verbal Context to Font Selection

2 code implementations ACL 2020 Amirreza Shirani, Franck Dernoncourt, Jose Echevarria, Paul Asente, Nedim Lipka, Thamar Solorio

In this paper, we aim to learn associations between visual attributes of fonts and the verbal context of the texts they are typically applied to.

Detecting Early Signs of Cyberbullying in Social Media

no code implementations LREC 2020 Niloofar Safi Samghabadi, Adri{\'a}n Pastor L{\'o}pez Monroy, Thamar Solorio

We also investigate the possibility of designing a framework to monitor the streams of users{'} online messages and detects the signs of cyberbullying as early as possible.

Abusive Language

Aggression and Misogyny Detection using BERT: A Multi-Task Approach

1 code implementation LREC 2020 Niloofar Safi Samghabadi, Parth Patwa, Srinivas PYKL, Prerana Mukherjee, Amitava Das, Thamar Solorio

In recent times, the focus of the NLP community has increased towards offensive language, aggression, and hate-speech detection. This paper presents our system for TRAC-2 shared task on {``}Aggression Identification{''} (sub-task A) and {``}Misogynistic Aggression Identification{''} (sub-task B).

Abusive Language Aggression Identification +3

From English to Code-Switching: Transfer Learning with Strong Morphological Clues

1 code implementation ACL 2020 Gustavo Aguilar, Thamar Solorio

We show the effectiveness of this transfer learning step by outperforming multilingual BERT and homologous CS-unaware ELMo models and establishing a new state of the art in CS tasks, such as NER and POS tagging.

Language Identification NER +4

Multi-view Story Characterization from Movie Plot Synopses and Reviews

no code implementations EMNLP 2020 Sudipta Kar, Gustavo Aguilar, Mirella Lapata, Thamar Solorio

This paper considers the problem of characterizing stories by inferring properties such as theme and style using written synopses and reviews of movies.


Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media

no code implementations NAACL 2018 Gustavo Aguilar, A. Pastor López-Monroy, Fabio A. González, Thamar Solorio

Our systems outperform the current F1 scores of the state of the art on the Workshop on Noisy User-generated Text 2017 dataset by 2. 45% and 3. 69%, establishing a more suitable approach for social media environments.

Named Entity Recognition (NER) Word Embeddings

Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task

no code implementations WS 2018 Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia Hirschberg, Thamar Solorio

In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data.

named-entity-recognition Named Entity Recognition +2

Question Relatedness on Stack Overflow: The Task, Dataset, and Corpus-inspired Models

no code implementations3 May 2019 Amirreza Shirani, Bowen Xu, David Lo, Thamar Solorio, Amin Alipour

The proposed dataset Stack Overflow is a useful resource to develop novel solutions, specifically data-hungry neural network models, for the prediction of relatedness in technical community question-answering forums.

Community Question Answering Multi-class Classification

Folksonomication: Predicting Tags for Movies from Plot Synopses Using Emotion Flow Encoded Neural Network

no code implementations COLING 2018 Sudipta Kar, Suraj Maharjan, Thamar Solorio

Folksonomy of movies covers a wide range of heterogeneous information about movies, like the genre, plot structure, visual experiences, soundtracks, metadata, and emotional experiences from watching a movie.


UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering

no code implementations SEMEVAL 2016 Marc Franco-Salvador, Sudipta Kar, Thamar Solorio, Paolo Rosso

In this work we describe the system built for the three English subtasks of the SemEval 2016 Task 3 by the Department of Computer Science of the University of Houston (UH) and the Pattern Recognition and Human Language Technology (PRHLT) research center - Universitat Polit`ecnica de Val`encia: UH-PRHLT.

Community Question Answering Knowledge Graphs

Language Identification and Analysis of Code-Switched Social Media Text

no code implementations WS 2018 Deepthi Mave, Suraj Maharjan, Thamar Solorio

In this paper, we detail our work on comparing different word-level language identification systems for code-switched Hindi-English data and a standard Spanish-English dataset.

Language Identification Machine Translation

Detecting Nastiness in Social Media

no code implementations WS 2017 Niloofar Safi Samghabadi, Suraj Maharjan, Alan Sprague, Raquel Diaz-Sprague, Thamar Solorio

Although social media has made it easy for people to connect on a virtually unlimited basis, it has also opened doors to people who misuse it to undermine, harass, humiliate, threaten and bully others.

RiTUAL-UH at SemEval-2017 Task 5: Sentiment Analysis on Financial Data Using Neural Networks

no code implementations SEMEVAL 2017 Sudipta Kar, Suraj Maharjan, Thamar Solorio

In this paper, we present our systems for the {``}SemEval-2017 Task-5 on Fine-Grained Sentiment Analysis on Financial Microblogs and News{''}.

Sentiment Analysis

Gated Multimodal Units for Information Fusion

9 code implementations7 Feb 2017 John Arevalo, Thamar Solorio, Manuel Montes-y-Gómez, Fabio A. González

The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities.

General Classification Genre classification

CogALex-V Shared Task: GHHH - Detecting Semantic Relations via Word Embeddings

no code implementations WS 2016 Mohammed Attia, Suraj Maharjan, Younes Samih, Laura Kallmeyer, Thamar Solorio

The evaluation results of our system on the test set is 88. 1{\%} (79. 0{\%} for TRUE only) f-measure for Task-1 on detecting semantic similarity, and 76. 0{\%} (42. 3{\%} when excluding RANDOM) for Task-2 on identifying finer-grained semantic relations.

Binary Classification General Classification +7

Evaluation of YTEX and MetaMap for clinical concept recognition

no code implementations7 Feb 2014 John David Osborne, Binod Gyawali, Thamar Solorio

We used MetaMap and YTEX as a basis for the construc- tion of two separate systems to participate in the 2013 ShARe/CLEF eHealth Task 1[9], the recognition of clinical concepts.

Cannot find the paper you are looking for? You can Submit a new open access paper.