Search Results for author: Animesh Mukherjee

Found 81 papers, 31 papers with code

A Data Bootstrapping Recipe for Low-Resource Multilingual Relation Classification

no code implementations CoNLL (EMNLP) 2021 Arijit Nag, Bidisha Samanta, Animesh Mukherjee, Niloy Ganguly, Soumen Chakrabarti

Data collection is challenging for Indian languages, because they are syntactically and morphologically diverse, as well as different from resource-rich languages like English.

Classification Relation +1

Antitrust, Amazon, and Algorithmic Auditing

no code implementations27 Mar 2024 Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Jens Frankenreiter, Stefan Bechtold, Krishna P. Gummadi

In digital markets, antitrust law and special regulations aim to ensure that markets remain competitive despite the dominating role that digital platforms play today in everyone's life.

On Zero-Shot Counterspeech Generation by LLMs

1 code implementation22 Mar 2024 Punyajoy Saha, Aalok Agrawal, Abhik Jana, Chris Biemann, Animesh Mukherjee

In terms of prompting, we find that our proposed strategies help in improving counter speech generation across all the models.

Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs

no code implementations8 Mar 2024 Arijit Nag, Animesh Mukherjee, Niloy Ganguly, Soumen Chakrabarti

As means to reduce the number of tokens processed by the LLM, we consider code-mixing, translation, and transliteration of LRLs to HRLs.

Transliteration

DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem

1 code implementation25 Feb 2024 Somnath Banerjee, Avik Dutta, Aaditya Agrawal, Rima Hazra, Animesh Mukherjee

With the AI revolution in place, the trend for building automated systems to support professionals in different domains such as the open source software systems, healthcare systems, banking systems, transportation systems and many others have become increasingly prominent.

Active Learning named-entity-recognition +3

How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

1 code implementation23 Feb 2024 Somnath Banerjee, Sayan Layek, Rima Hazra, Animesh Mukherjee

We query a series of LLMs -- Llama-2-13b, Llama-2-7b, Mistral-V2 and Mistral 8X7B -- and ask them to generate both text and instruction-centric responses.

Model Editing Response Generation

InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks

no code implementations22 Feb 2024 Somnath Banerjee, Maulindu Sarkar, Punyajoy Saha, Binny Mathew, Animesh Mukherjee

Second, in a dataset extension exercise, using influence functions to automatically identify data points that have been initially `silver' annotated by some existing method and need to be cross-checked (and corrected) by annotators to improve the model performance.

Sarcasm Detection Stance Classification

Mask-up: Investigating Biases in Face Re-identification for Masked Faces

no code implementations21 Feb 2024 Siddharth D Jaiswal, Ankit Kr. Verma, Animesh Mukherjee

Three of the commercial and five of the open-source FRSs are highly inaccurate; they further perpetuate biases against non-White individuals, with the lowest accuracy being 0%.

Face Recognition

GRAFFORD: A Benchmark Dataset for Testing the Knowledge of Object Affordances of Language and Vision Models

1 code implementation20 Feb 2024 Sayantan Adak, Daivik Agrawal, Animesh Mukherjee, Somak Aditya

We investigate the knowledge of object affordances in pre-trained language models (LMs) and pre-trained Vision-Language models (VLMs).

Object

Zero shot VLMs for hate meme detection: Are we there yet?

no code implementations19 Feb 2024 Naquee Rizwan, Paramananda Bhaskar, Mithun Das, Swadhin Satyaprakash Majhi, Punyajoy Saha, Animesh Mukherjee

In this study, we aim to investigate the efficacy of these visual language models in handling intricate tasks such as hate meme detection.

Zero-Shot Learning

Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and Hindi

1 code implementation11 Feb 2024 Mithun Das, Saurabh Kumar Pandey, Shivansh Sethi, Punyajoy Saha, Animesh Mukherjee

With the rise of online abuse, the NLP community has begun investigating the use of neural architectures to generate counterspeech that can "counter" the vicious tone of such abusive speech and dilute/ameliorate their rippling effect over the social network.

Probing LLMs for hate speech detection: strengths and vulnerabilities

no code implementations19 Oct 2023 Sarthak Roy, Ashish Harshavardhan, Animesh Mukherjee, Punyajoy Saha

Recently efforts have been made by social media platforms as well as researchers to detect hateful or toxic language using large language models.

Hate Speech Detection

BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification

1 code implementation18 Oct 2023 Mithun Das, Animesh Mukherjee

Finally, we perform a qualitative error analysis of the misclassified memes of the best-performing text-based, image-based and multimodal models.

Classification Meme Classification

Auditing Gender Analyzers on Text Data

no code implementations9 Oct 2023 Siddharth D Jaiswal, Ankit Kumar Verma, Animesh Mukherjee

Predictions for non-binary comments on all platforms are mostly female, thus propagating the societal bias that non-binary individuals are effeminate.

Modeling interdisciplinary interactions among Physics, Mathematics & Computer Science

no code implementations19 Sep 2023 Rima Hazra, Mayank Singh, Pawan Goyal, Bibhas Adhikari, Animesh Mukherjee

Interdisciplinarity has over the recent years have gained tremendous importance and has become one of the key ways of doing cutting edge research.

Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

no code implementations12 Sep 2023 Rima Hazra, Agnik Saha, Somnath Banerjee, Animesh Mukherjee

Community Question Answering (CQA) platforms steadily gain popularity as they provide users with fast responses to their queries.

Community Question Answering

Personality Detection and Analysis using Twitter Data

no code implementations11 Sep 2023 Abhilash Datta, Souvic Chakraborty, Animesh Mukherjee

We also perform a series of ablation studies to show how the baselines perform for our dataset.

Marketing

Duplicate Question Retrieval and Confirmation Time Prediction in Software Communities

no code implementations10 Sep 2023 Rima Hazra, Debanjan Saha, Amruit Sahoo, Somnath Banerjee, Animesh Mukherjee

To facilitate the task of the moderators, in this work, we have tackled two significant issues for the askubuntu CQA platform: (1) retrieval of duplicate questions given a new question and (2) duplicate question confirmation time prediction.

Community Question Answering Duplicate-Question Retrieval +1

A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

1 code implementation20 Jul 2023 Anand Kumar Rai, Siddharth D Jaiswal, Animesh Mukherjee

Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

HateMM: A Multi-Modal Dataset for Hate Video Classification

1 code implementation6 May 2023 Mithun Das, Rohit Raj, Punyajoy Saha, Binny Mathew, Manish Gupta, Animesh Mukherjee

Hate speech has become one of the most significant issues in modern society, having implications in both the online and the offline world.

Classification Hate Speech Detection +1

On the rise of fear speech in online social media

1 code implementation18 Mar 2023 Punyajoy Saha, Kiran Garimella, Narla Komal Kalyan, Saurabh Kumar Pandey, Pauras Mangesh Meher, Binny Mathew, Animesh Mukherjee

Recently, social media platforms are heavily moderated to prevent the spread of online hate speech, which is usually fertile in toxic words and is directed toward an individual or a community.

Diversity matters: Robustness of bias measurements in Wikidata

1 code implementation27 Feb 2023 Paramita Das, Sai Keerthana Karnam, Anirban Panda, Bhanu Prakash Reddy Guda, Soumya Sarkar, Animesh Mukherjee

With the widespread use of knowledge graphs (KG) in various automated AI systems and applications, it is very important to ensure that information retrieval algorithms leveraging them are free from societal biases.

Attribute Information Retrieval +3

HateProof: Are Hateful Meme Detection Systems really Robust?

no code implementations11 Feb 2023 Piush Aggarwal, Pranit Chawla, Mithun Das, Punyajoy Saha, Binny Mathew, Torsten Zesch, Animesh Mukherjee

Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks.

Contrastive Learning

Rationale-Guided Few-Shot Classification to Detect Abusive Language

1 code implementation30 Nov 2022 Punyajoy Saha, Divyanshu Sheth, Kushal Kedia, Binny Mathew, Animesh Mukherjee

We introduce two rationale-integrated BERT-based architectures (the RGFS models) and evaluate our systems over five different abusive language datasets, finding that in the few-shot classification setting, RGFS-based models outperform baseline models by about 7% in macro F1 scores and perform competitively to models finetuned on other source domains.

Abusive Language Classification +1

Hate Speech and Offensive Language Detection in Bengali

1 code implementation7 Oct 2022 Mithun Das, Somnath Banerjee, Punyajoy Saha, Animesh Mukherjee

To overcome the existing research's limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets.

Hate Speech Detection

Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction

no code implementations21 Sep 2022 Souvic Chakraborty, Pawan Goyal, Animesh Mukherjee

With the rising participation of the common mass in social media, it is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations.

Decoding Demographic un-fairness from Indian Names

1 code implementation7 Sep 2022 Medidoddi Vahini, Jalend Bantupalli, Souvic Chakraborty, Animesh Mukherjee

Demographic classification is essential in fairness assessment in recommender systems or in measuring unintended bias in online networks and voting systems.

Fairness Recommendation Systems

"Dummy Grandpa, do you know anything?": Identifying and Characterizing Ad hominem Fallacy Usage in the Wild

no code implementations5 Sep 2022 Utkarsh Patel, Animesh Mukherjee, Mainack Mondal

Today, participating in discussions on online forums is extremely commonplace and these discussions have started rendering a strong influence on the overall opinion of online users.

Misinformation

Placing (Historical) Facts on a Timeline: A Classification cum Coref Resolution Approach

1 code implementation28 Jun 2022 Sayantan Adak, Altaf Ahmad, Aditya Basu, Animesh Mukherjee

A timeline provides one of the most effective ways to visualize the important historical facts that occurred over a period of time, presenting the insights that may not be so apparent from reading the equivalent information in textual form.

coreference-resolution Event Coreference Resolution +2

CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech

1 code implementation9 May 2022 Punyajoy Saha, Kanishk Singh, Adarsh Kumar, Binny Mathew, Animesh Mukherjee

We generate counterspeech using three datasets and observe significant improvement across different attribute scores.

Attribute

HateCheckHIn: Evaluating Hindi Hate Speech Detection Models

1 code implementation LREC 2022 Mithun Das, Punyajoy Saha, Binny Mathew, Animesh Mukherjee

To enable more targeted diagnostic insights of such multilingual hate speech models, we introduce a set of functionalities for the purpose of evaluation.

Hate Speech Detection

Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages

1 code implementation26 Apr 2022 Mithun Das, Somnath Banerjee, Animesh Mukherjee

In this paper, to bridge the gap, we demonstrate a large-scale analysis of multilingual abusive speech in Indic languages.

Abusive Language

FaiRIR: Mitigating Exposure Bias from Related Item Recommendations in Two-Sided Platforms

1 code implementation1 Apr 2022 Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Krishna P. Gummadi

To this end, our experiments on multiple real-world RIR datasets reveal that the existing RIR algorithms often result in very skewed exposure distribution of items, and the quality of items is not a plausible explanation for such skew in exposure.

Alexa, in you, I trust! Fairness and Interpretability Issues in E-commerce Search through Smart Speakers

no code implementations8 Feb 2022 Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Krishna P. Gummadi

While investigating for the fairness of the default action, we observe that over a set of as many as 1000 queries, in nearly 68% cases, there exist one or more products which are more relevant (as per Amazon's own desktop search results) than the product chosen by Alexa.

Fairness

Two-Face: Adversarial Audit of Commercial Face Recognition Systems

no code implementations17 Nov 2021 Siddharth D Jaiswal, Karthikeya Duggirala, Abhisek Dash, Animesh Mukherjee

Computer vision applications like automated face detection are used for a variety of purposes ranging from unlocking smart devices to tracking potential persons of interest for surveillance.

Face Detection Face Recognition +1

Quality change: norm or exception? Measurement, Analysis and Detection of Quality Change in Wikipedia

1 code implementation2 Nov 2021 Paramita Das, Bhanu Prakash Reddy Guda, Sasi Bhusan Seelaboyina, Soumya Sarkar, Animesh Mukherjee

To the best of our knowledge, this is the first work that rigorously explores English Wikipedia article quality life cycle from the perspective of quality indicators and provides a novel unsupervised page level approach to detect quality switch, which can help in automatic content monitoring in Wikipedia thus contributing significantly to the CSCW community.

Change Point Detection Time Series Analysis

A Data Bootstrapping Recipe for Low Resource Multilingual Relation Classification

no code implementations18 Oct 2021 Arijit Nag, Bidisha Samanta, Animesh Mukherjee, Niloy Ganguly, Soumen Chakrabarti

Relation classification (sometimes called 'extraction') requires trustworthy datasets for fine-tuning large language models, as well as for evaluation.

Classification Relation +1

When expertise gone missing: Uncovering the loss of prolific contributors in Wikipedia

no code implementations21 Sep 2021 Paramita Das, Bhanu Prakash Reddy Guda, Debajit Chakraborty, Soumya Sarkar, Animesh Mukherjee

Success of planetary-scale online collaborative platforms such as Wikipedia is hinged on active and continued participation of its voluntary contributors.

Information Retrieval Retrieval

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

1 code implementation21 Jul 2021 Srijan Bansal, Vishal Garimella, Ayush Suhane, Animesh Mukherjee

In this paper, we advance the current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting.

Multilingual Word Embeddings

"Short is the Road that Leads from Fear to Hate": Fear Speech in Indian WhatsApp Groups

2 code implementations7 Feb 2021 Punyajoy Saha, Binny Mathew, Kiran Garimella, Animesh Mukherjee

We observe that users writing fear speech messages use various events and symbols to create the illusion of fear among the reader about a target community.

When the Umpire is also a Player: Bias in Private Label Product Recommendations on E-commerce Marketplaces

no code implementations30 Jan 2021 Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Krishna P. Gummadi

Along a number of our proposed bias measures, we find that the sponsored recommendations are significantly more biased toward Amazon private label products compared to organic recommendations.

Fairness

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

6 code implementations18 Dec 2020 Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, Animesh Mukherjee

We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.

Hate Speech Detection Text Classification

Gandhipedia: A one-stop AI-enabled portal for browsing Gandhian literature, life-events and his social network

no code implementations5 Jun 2020 Sayantan Adak, Atharva Vyas, Animesh Mukherjee, Heer Ambavi, Pritam Kadasi, Mayank Singh, Shivam Patel

We introduce an AI-enabled portal that presents an excellent visualization of Mahatma Gandhi's life events by constructing temporal and spatial social networks from the Gandhian literature.

Aspect-based Sentiment Analysis of Scientific Reviews

1 code implementation5 Jun 2020 Souvic Chakraborty, Pawan Goyal, Animesh Mukherjee

We also investigate the extent of disagreement between the reviewers and the chair and find that the inter-reviewer disagreement may have a link to the disagreement with the chair.

8k Active Learning +2

HateMonitors: Language Agnostic Abuse Detection in Social Media

1 code implementation27 Sep 2019 Punyajoy Saha, Binny Mathew, Pawan Goyal, Animesh Mukherjee

In this paper, we present our machine learning model, HateMonitor, developed for Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), a shared task at FIRE 2019.

Abuse Detection Abusive Language +1

Competing Topic Naming Conventions in Quora: Predicting Appropriate Topic Merges and Winning Topics from Millions of Topic Pairs

no code implementations10 Sep 2019 Binny Mathew, Suman Kalyan Maity, Pawan Goyal, Animesh Mukherjee

Our system is also able to predict ~ 25% of the correct case of merges within the first month of the merge and ~ 40% of the cases within a year.

Anomaly Detection TAG

On the Compositionality Prediction of Noun Phrases using Poincar\'e Embeddings

no code implementations ACL 2019 Abhik Jana, Dima Puzyrev, Alex Panchenko, er, Pawan Goyal, Chris Biemann, Animesh Mukherjee

In particular, we use hypernymy information of the multiword and its constituents encoded in the form of the recently introduced Poincar{\'e} embeddings in addition to the distributional information to detect compositionality for noun phrases.

StRE: Self Attentive Edit Quality Prediction in Wikipedia

1 code implementation ACL 2019 Soumya Sarkar, Bhanu Prakash Reddy, Sandipan Sikdar, Animesh Mukherjee

Wikipedia can easily be justified as a behemoth, considering the sheer volume of content that is added or removed every minute to its several projects.

On the Compositionality Prediction of Noun Phrases using Poincaré Embeddings

no code implementations7 Jun 2019 Abhik Jana, Dmitry Puzyrev, Alexander Panchenko, Pawan Goyal, Chris Biemann, Animesh Mukherjee

In particular, we use hypernymy information of the multiword and its constituents encoded in the form of the recently introduced Poincar\'e embeddings in addition to the distributional information to detect compositionality for noun phrases.

KGPChamps at SemEval-2019 Task 3: A deep learning approach to detect emotions in the dialog utterances.

no code implementations SEMEVAL 2019 Jasabanta Patro, Nitin Choudhary, Kalpit Chittora, Animesh Mukherjee

We report the bidirectional LSTM model, along with the input word embedding as the concatenation of word embedding generated from bidirectional LSTM for word characters and conceptnet embedding, as the best performing model with a highest micro-F1 score of 0. 7261.

DeepTagRec: A Content-cum-User based Tag Recommendation Framework for Stack Overflow

1 code implementation10 Mar 2019 Suman Kalyan Maity, Abhishek Panigrahi, Sayan Ghosh, Arundhati Banerjee, Pawan Goyal, Animesh Mukherjee

In this paper, we develop a content-cum-user based deep learning framework DeepTagRec to recommend appropriate question tags on Stack Overflow.

TAG

A Network-centric Framework for Auditing Recommendation Systems

no code implementations7 Feb 2019 Abhisek Dash, Animesh Mukherjee, Saptarshi Ghosh

In this work, we propose a novel network-centric framework which is not only able to quantify various static properties of RSs, but also is able to quantify dynamic properties such as how likely RSs are to lead to polarization or segregation of information among their users.

Recommendation Systems

Detecting Reliable Novel Word Senses: A Network-Centric Approach

no code implementations14 Dec 2018 Abhik Jana, Animesh Mukherjee, Pawan Goyal

The outlined method can therefore be used as a new post-hoc step to improve the precision of novel word sense detection in a robust and reliable way where the underlying framework uses a graph structure.

Analyzing the hate and counter speech accounts on Twitter

no code implementations6 Dec 2018 Binny Mathew, Navish Kumar, Ravina, Pawan Goyal, Animesh Mukherjee

We also build a supervised model for classifying the hateful and counterspeech accounts on Twitter and obtain an F-score of 0. 77.

Social and Information Networks

Spread of hate speech in online social media

no code implementations4 Dec 2018 Binny Mathew, Ritam Dutt, Pawan Goyal, Animesh Mukherjee

The present online social media platform is afflicted with several issues, with hate speech being on the predominant forefront.

Social and Information Networks

Deep Dive into Anonymity: A Large Scale Analysis of Quora Questions

no code implementations17 Nov 2018 Binny Mathew, Ritam Dutt, Suman Kalyan Maity, Pawan Goyal, Animesh Mukherjee

In particular, we observe that the choice to post the question as anonymous is dependent on the user's perception of anonymity and they often choose to speak about depression, anxiety, social ties and personal issues under the guise of anonymity.

Deep Learning for Social Media Health Text Classification

no code implementations WS 2018 Santosh Tokala, Vaibhav Gambhir, Animesh Mukherjee

This paper describes the systems developed for 1st and 2nd tasks of the 3rd Social Media Mining for Health Applications Shared Task at EMNLP 2018.

Binary Classification General Classification +4

WikiRef: Wikilinks as a route to recommending appropriate references for scientific Wikipedia pages

no code implementations COLING 2018 Abhik Jana, Pranjal Kanojiya, Pawan Goyal, Animesh Mukherjee

In this paper, we propose a novel two step approach -- WikiRef -- that (i) leverages the wikilinks present in a scientific Wikipedia target page and, thereby, (ii) recommends highly relevant references to be included in that target page appropriately and automatically borrowed from the reference section of the wikilinks.

AppTechMiner: Mining Applications and Techniques from Scientific Articles

no code implementations10 Sep 2017 Mayank Singh, Soham Dan, Sanyam Agarwal, Pawan Goyal, Animesh Mukherjee

We also categorize individual research articles based on their application areas and the techniques proposed/improved in the article.

Information Retrieval Retrieval

Is this word borrowed? An automatic approach to quantify the likeliness of borrowing in social media

no code implementations15 Mar 2017 Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Prithwish Mukherjee, Monojit Choudhury, Animesh Mukherjee

We first propose context based clustering method to sample a set of candidate words from the social media data. Next, we propose three novel and similar metrics based on the usage of these words by the users in different tweets; these metrics were used to score and rank the candidate words indicating their borrowed likeliness.

Clustering

Language Use Matters: Analysis of the Linguistic Structure of Question Texts Can Characterize Answerability in Quora

no code implementations11 Mar 2017 Suman Kalyan Maity, Aman Kharb, Animesh Mukherjee

Notably, features representing the language use patterns of the users are most discriminative and alone account for an accuracy of 74. 18%.

Which techniques does your application use?: An information extraction framework for scientific articles

no code implementations23 Aug 2016 Soham Dan, Sanyam Agarwal, Mayank Singh, Pawan Goyal, Animesh Mukherjee

Every field of research consists of multiple application areas with various techniques routinely used to solve problems in these wide range of application areas.

Language Modelling

WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter

no code implementations31 Jan 2016 Suman Kalyan Maity, Chaitanya Sarda, Anshit Chaudhary, Abhijeet Patil, Shraman Kumar, Akash Mondal, Animesh Mukherjee

Language in social media is mostly driven by new words and spellings that are constantly entering the lexicon thereby polluting it and resulting in high deviation from the formal written version.

General Classification

That's sick dude!: Automatic identification of word sense change across different timescales

no code implementations ACL 2014 Sunny Mitra, Ritwik Mitra, Martin Riedl, Chris Biemann, Animesh Mukherjee, Pawan Goyal

In this paper, we propose an unsupervised method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books.

Word Sense Disambiguation

Cannot find the paper you are looking for? You can Submit a new open access paper.