DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem

1 code implementation25 Feb 2024 Somnath Banerjee, Avik Dutta, Aaditya Agrawal, Rima Hazra, Animesh Mukherjee

With the AI revolution in place, the trend for building automated systems to support professionals in different domains such as the open source software systems, healthcare systems, banking systems, transportation systems and many others have become increasingly prominent.

Active Learning named-entity-recognition +3

How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

1 code implementation23 Feb 2024 Somnath Banerjee, Sayan Layek, Rima Hazra, Animesh Mukherjee

We query a series of LLMs -- Llama-2-13b, Llama-2-7b, Mistral-V2 and Mistral 8X7B -- and ask them to generate both text and instruction-centric responses.

Model Editing Response Generation

InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks

no code implementations22 Feb 2024 Somnath Banerjee, Maulindu Sarkar, Punyajoy Saha, Binny Mathew, Animesh Mukherjee

Second, in a dataset extension exercise, using influence functions to automatically identify data points that have been initially `silver' annotated by some existing method and need to be cross-checked (and corrected) by annotators to improve the model performance.

Sarcasm Detection Stance Classification

Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language Models

1 code implementation19 Jan 2024 Rima Hazra, Sayan Layek, Somnath Banerjee, Soujanya Poria

In the rapidly advancing field of artificial intelligence, the concept of Red-Teaming or Jailbreaking large language models (LLMs) has emerged as a crucial area of study.

Model Editing

Redefining Developer Assistance: Through Large Language Models in Software Ecosystem

no code implementations9 Dec 2023 Somnath Banerjee, Avik Dutta, Sayan Layek, Amruit Sahoo, Sam Conrad Joyce, Rima Hazra

In this paper, we delve into the advancement of domain-specific Large Language Models (LLMs) with a focus on their application in software development.

Link Prediction named-entity-recognition +4

Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms

no code implementations12 Sep 2023 Rima Hazra, Agnik Saha, Somnath Banerjee, Animesh Mukherjee

Community Question Answering (CQA) platforms steadily gain popularity as they provide users with fast responses to their queries.

Community Question Answering

Duplicate Question Retrieval and Confirmation Time Prediction in Software Communities

no code implementations10 Sep 2023 Rima Hazra, Debanjan Saha, Amruit Sahoo, Somnath Banerjee, Animesh Mukherjee

To facilitate the task of the moderators, in this work, we have tackled two significant issues for the askubuntu CQA platform: (1) retrieval of duplicate questions given a new question and (2) duplicate question confirmation time prediction.

Community Question Answering Duplicate-Question Retrieval +1

Hate Speech and Offensive Language Detection in Bengali

1 code implementation7 Oct 2022 Mithun Das, Somnath Banerjee, Punyajoy Saha, Animesh Mukherjee

To overcome the existing research's limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets.

Hate Speech Detection

Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages

1 code implementation26 Apr 2022 Mithun Das, Somnath Banerjee, Animesh Mukherjee

In this paper, to bridge the gap, we demonstrate a large-scale analysis of multilingual abusive speech in Indic languages.

Abusive Language

Abusive and Threatening Language Detection in Urdu using Boosting based and BERT based models: A Comparative Approach

1 code implementation27 Nov 2021 Mithun Das, Somnath Banerjee, Punyajoy Saha

In this FIRE 2021 shared task - "HASOC- Abusive and Threatening language detection in Urdu" the organizers propose an abusive language detection dataset in Urdu along with threatening language detection.

Abusive Language

LIMSI_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network for Code-mixed Sentiment Analysis

1 code implementation30 Aug 2020 Somnath Banerjee, Sahar Ghannay, Sophie Rosset, Anne Vilnat, Paolo Rosso

This paper describes the participation of LIMSI UPV team in SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text.

Sentiment Analysis

JU\_ETCE\_17\_21 at SemEval-2019 Task 6: Efficient Machine Learning and Neural Network Approaches for Identifying and Categorizing Offensive Language in Tweets

1 code implementation SEMEVAL 2019 Preeti Mukherjee, Mainak Pal, Somnath Banerjee, Sudip Kumar Naskar

This paper describes our system submissions as part of our participation (team name: JU{\_}ETCE{\_}17{\_}21) in the SemEval 2019 shared task 6: {``}OffensEval: Identifying and Catego- rizing Offensive Language in Social Media{''}.

Language Identification Word Embeddings

NITMZ-JU at IJCNLP-2017 Task 4: Customer Feedback Analysis

no code implementations IJCNLP 2017 Somnath Banerjee, Partha Pakray, Riyanka Manna, Dipankar Das, Alex Gelbukh, er

In this paper, we describe a deep learning framework for analyzing the customer feedback as part of our participation in the shared task on Customer Feedback Analysis at the 8th International Joint Conference on Natural Language Processing (IJCNLP 2017).

Text Classification

