We fine-tune TAPAS (a model which extends BERT's architecture to capture tabular structure) for both the subtasks as it has shown state-of-the-art performance in various table understanding tasks.
Hence, we collect and label two datasets with 3, 102 and 3, 509 social media posts from Twitter and Gab respectively.
Emotion prediction is a critical task in the field of Natural Language Processing (NLP).
We chose to participate only in Task A which dealt with Sentiment Classification, which we formulated as a text classification problem.
While extensive popularity of online social media platforms has made information dissemination faster, it has also resulted in widespread online abuse of different types like hate speech, offensive language, sexist and racist opinions, etc.
Studies have shown musical engagement to be an indirect representation of internal states including internalized symptomatology and depression.
In this paper, we provide an alternate perspective on word representations, by reinterpreting the dimensions of the vector space of a word embedding as a collection of features.
This is the first attempt towards generating full-length natural answers from a graph input(confusion network) to the best of our knowledge.
Ellipsis resolution has been identified as an important step to improve the accuracy of mainstream Natural Language Processing (NLP) tasks such as information retrieval, event extraction, dialog systems, etc.
The advent of social media has immensely proliferated the amount of opinions and arguments voiced on the internet.
The available Paninian dependency treebank(s) for Telugu is annotated only with inter-chunk dependency relations and not all words of a sentence are part of the parse tree.
In this paper, we present the Hindi TimeBank, an ISO-TimeML annotated reference corpus for the detection and classification of events, states and time expressions, and the links between them.
Spoken dialogue systems typically use a list of top-N ASR hypotheses for inferring the semantic meaning and tracking the state of the dialogue.
Traditional methods for deep NLG adopt pipeline approaches comprising stages such as constructing syntactic input, predicting function words, linearizing the syntactic input and generating the surface forms.
Ranked #1 on Data-to-Text Generation on SR11Deep
A reading comprehension system extracts a span of text, comprising of named entities, dates, small phrases, etc., which serve as the answer to a given question.
The Collective Encoder captures the overall sentiment of the sentence, while the Specific Encoder utilizes an attention mechanism in order to focus on individual sentiment-bearing sub-words.
We present a Telugu-English code-mixed corpus with the corresponding named entity tags.
Sentiment Analysis and other semantic tasks are commonly used for social media textual analysis to gauge public opinion and make sense from the noise on social media.
This paper describes our system (Fermi) for Task 6: OffensEval: Identifying and Categorizing Offensive Language in Social Media of SemEval-2019.
This information is highly useful in segregating factual questions from non-factual ones which highly helps in organizing the questions into useful categories and trims down the problem space for the next task in the pipeline for fact evaluation among the available answers.
This paper describes our system (Fermi) for Task 5 of SemEval-2019: HatEval: Multilingual Detection of Hate Speech Against Immigrants and Women on Twitter.
We present four new datasets for this task, two multiclass datasets with 550 and 1159 problems each and two multilabel datasets having 3737 and 3960 problems each.
SimpleQuestions is a commonly used benchmark for single-factoid question answering (QA) over Knowledge Graphs (KG).
In this paper, we introduce a deep learning based classification system for Facebook posts and comments of Hindi-English Code-Mixed text to detect the aggressive behaviour of/towards users.
We generate sub-word level embeddings of the title using Convolutional Neural Networks and use them to train a bidirectional LSTM architecture.
In this paper, we leverage social media platforms such as twitter for developing corpus across multiple languages.
With the help of the created parallel corpus, we analyzed the structure of English-Hindi code-mixed data and present a technique to augment run-of-the-mill machine translation (MT) approaches that can help achieve superior translations without the need for specially designed translation systems.
Named Entity Recognition (NER) is a major task in the field of Natural Language Processing (NLP), and also is a sub-task of Information Extraction.
Natural Language Generation (NLG) is a research task which addresses the automatic generation of natural language text representative of an input non-linguistic collection of knowledge.
This paper presents a system that automatically generates multiple, natural language questions using relative pronouns and relative adverbs from complex English sentences.
Our network is trained only on English questions provided in this dataset and noisy Hindi translations of these questions and can answer English-Hindi CM questions effectively without the need of translation into English.
In this paper, we analyze the task of author's gender prediction in code-mixed content and present a corpus of English-Hindi texts collected from Twitter which is annotated with author's gender.
In this paper, we analyze the task of humor detection in texts and describe a freely available corpus containing English-Hindi code-mixed tweets annotated with humorous(H) or non-humorous(N) tags.
The model learns the representation of resource-poor and resource-rich sentences in a common space by using the similarity between their assigned annotation tags.
Hate speech detection in social media texts is an important Natural language Processing task, which has several crucial applications like sentiment analysis, investigating cyberbullying and examining socio-political controversies.
Emotion Prediction is a Natural Language Processing (NLP) task dealing with detection and classification of emotions in various monolingual and bilingual texts.
Social media has become one of the main channels for peo- ple to communicate and share their views with the society.
Social media platforms like twitter and facebook have be- come two of the largest mediums used by people to express their views to- wards different topics.
We present a treebank of Hindi-English code-switching tweets under Universal Dependencies scheme and propose a neural stacking model for parsing that efficiently leverages part-of-speech tag and syntactic tree annotations in the code-switching treebank and the preexisting Hindi and English treebanks.
Code-mixed data is an important challenge of natural language processing because its characteristics completely vary from the traditional structures of standard languages.
Social media platforms such as Twitter and Facebook are becoming popular in multilingual societies.
Machine learning approaches in sentiment analysis principally rely on the abundance of resources.
The model learns the representations of resource-poor and resource-rich language in a common emoji space by using a similarity metric based on the emojis present in sentences from both languages.
CREDO consists of different modules for capturing various features responsible for the credibility of an article.
Our model handles the problem of data scarcity which is faced by many languages in the world and yields improved word embeddings for words in the target language by relying on transformed embeddings of words of the source language.
We evaluate our method using small sized training sets on eleven test sets for the word similarity task across seven languages.
The worldstate and the query are processed separately in two different networks and finally, the networks are merged to predict the final operation.
Word embeddings learned from text corpus can be improved by injecting knowledge from external resources, while at the same time also specializing them for similarity or relatedness.
We present an unsupervised, language agnostic approach for exploiting morphological regularities present in high dimensional vector spaces.
With the advent of word representations, word similarity tasks are becoming increasing popular as an evaluation metric for the quality of the representations.
In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data.
Given a question-answer pair along with its metadata, the DFFN architecture independently - a) learns features from the Deep Neural Network (DNN) and b) computes hand-crafted features using various external resources and then combines them using a fully connected neural network trained to predict the final answer quality.
We introduce a Hindi-English (Hi-En) code-mixed dataset for sentiment analysis and perform empirical analysis comparing the suitability and performance of various state-of-the-art SA methods in social media.
In this paper we describe an end to end Neural Model for Named Entity Recognition NER) which is based on Bi-Directional RNN-LSTM.
Current AQP systems either learn models using - a) various hand-crafted features (HCF) or b) use deep learning (DL) techniques which automatically learn the required feature representations.
Most of the existing LID systems rely on modeling the language discriminative information from low-level acoustic features.