Search Results for author: Niloy Ganguly

Found 65 papers, 40 papers with code

A Data Bootstrapping Recipe for Low-Resource Multilingual Relation Classification

no code implementations CoNLL (EMNLP) 2021 Arijit Nag, Bidisha Samanta, Animesh Mukherjee, Niloy Ganguly, Soumen Chakrabarti

Data collection is challenging for Indian languages, because they are syntactically and morphologically diverse, as well as different from resource-rich languages like English.

Classification Relation +1

Efficient Continual Pre-training of LLMs for Low-resource Languages

no code implementations13 Dec 2024 Arijit Nag, Soumen Chakrabarti, Animesh Mukherjee, Niloy Ganguly

On the other hand, continual pre-training (CPT) with large amounts of language-specific data is a costly proposition in terms of data acquisition and computational resources.

Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models

1 code implementation4 Oct 2024 Gunjan Balde, Soumyadeep Roy, Mainack Mondal, Niloy Ganguly

Current approaches trivially append the target domain-specific vocabulary at the end of the PLM vocabulary.

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

1 code implementation20 Sep 2024 Abhilash Nandy, Yash Agarwal, Ashish Patwa, Millon Madhur Das, Aman Bansal, Ankit Raj, Pawan Goyal, Niloy Ganguly

In this paper, we propose the challenging tasks of Satirical Image Detection (detecting whether an image is satirical), Understanding (generating the reason behind the image being satirical), and Completion (given one half of the image, selecting the other half from 2 given options, such that the complete image is satirical) and release a high-quality dataset YesBut, consisting of 2547 images, 1084 satirical and 1463 non-satirical, containing different artistic styles, to evaluate those tasks.

Benchmarking Image Captioning

Unlocking Efficiency: Adaptive Masking for Gene Transformer Models

1 code implementation13 Aug 2024 Soumyadeep Roy, Shamik Sural, Niloy Ganguly

Our findings reveal that CM-GEMS outperforms state-of-the-art models (DNABert-2, Nucleotide transformer, DNABert) trained at 120K steps, achieving similar results in just 10K and 1K steps.

Language Modeling Language Modelling +2

On The Persona-based Summarization of Domain-Specific Documents

1 code implementation6 Jun 2024 Ankan Mullick, Sombit Bose, Rounak Saha, Ayan Kumar Bhowmick, Pawan Goyal, Niloy Ganguly, Prasenjit Dey, Ravi Kokku

However, every persona of a domain has different requirements of information and hence their summarization.

MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization

1 code implementation7 May 2024 Gunjan Balde, Soumyadeep Roy, Mainack Mondal, Niloy Ganguly

In contrast to existing domain adaptation approaches in summarization, MEDVOC treats vocabulary as an optimizable parameter and optimizes the PLM vocabulary based on fragment score conditioned only on the downstream task's reference summaries.

Domain Adaptation Text Summarization

Instruction-Guided Bullet Point Summarization of Long Financial Earnings Call Transcripts

1 code implementation3 May 2024 Subhendu Khatuya, Koushiki Sinha, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

While automatic summarization techniques have made significant advancements, their primary focus has been on summarizing short news articles or documents that have clear structural patterns like scientific articles or government reports.

Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling

1 code implementation3 May 2024 Subhendu Khatuya, Rajdeep Mukherjee, Akash Ghosh, Manjunath Hegde, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

We study the problem of automatically annotating relevant numerals (GAAP metrics) occurring in the financial documents with their corresponding XBRL tags.

TIGQA:An Expert Annotated Question Answering Dataset in Tigrinya

no code implementations26 Apr 2024 Hailay Teklehaymanot, Dren Fazlija, Niloy Ganguly, Gourab K. Patro, Wolfgang Nejdl

The absence of explicitly tailored, accessible annotated datasets for educational purposes presents a notable obstacle for NLP tasks in languages with limited resources. This study initially explores the feasibility of using machine translation (MT) to convert an existing dataset into a Tigrinya dataset in SQuAD format.

Machine Translation Question Answering +1

Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions

1 code implementation20 Apr 2024 Soumyadeep Roy, Aparup Khatua, Fatemeh Ghoochani, Uwe Hadler, Wolfgang Nejdl, Niloy Ganguly

In our annotated dataset, a substantial portion of GPT-4's incorrect responses is categorized as a "Reasonable response by GPT-4," by annotators.

Order-Based Pre-training Strategies for Procedural Text Understanding

1 code implementation6 Apr 2024 Abhilash Nandy, Yash Kulkarni, Pawan Goyal, Niloy Ganguly

In this paper, we propose sequence-based pretraining methods to enhance procedural understanding in natural language processing.

Procedural Text Understanding

How COVID-19 has Impacted the Anti-Vaccine Discourse: A Large-Scale Twitter Study Spanning Pre-COVID and Post-COVID Era

1 code implementation2 Apr 2024 Soham Poddar, Rajdeep Mukherjee, Subhendu Khatuya, Niloy Ganguly, Saptarshi Ghosh

The debate around vaccines has been going on for decades, but the COVID-19 pandemic showed how crucial it is to understand and mitigate anti-vaccine sentiments.

How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset

1 code implementation30 Mar 2024 Akash Ghosh, B Venkata Sahith, Niloy Ganguly, Pawan Goyal, Mayank Singh

Question-answering (QA) on hybrid scientific tabular and textual data deals with scientific information, and relies on complex numerical reasoning.

Question Answering

Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs

no code implementations8 Mar 2024 Arijit Nag, Animesh Mukherjee, Niloy Ganguly, Soumen Chakrabarti

As means to reduce the number of tokens processed by the LLM, we consider code-mixing, translation, and transliteration of LRLs to HRLs.


Long Dialog Summarization: An Analysis

no code implementations26 Feb 2024 Ankan Mullick, Ayan Kumar Bhowmick, Raghav R, Ravi Kokku, Prasenjit Dey, Pawan Goyal, Niloy Ganguly

Dialog summarization has become increasingly important in managing and comprehending large-scale conversations across various domains.


BOXREC: Recommending a Box of Preferred Outfits in Online Shopping

1 code implementation26 Feb 2024 Debopriyo Banerjee, Krothapalli Sreenivasa Rao, Shamik Sural, Niloy Ganguly

In this paper, we propose a box recommendation framework - BOXREC - which at first, collects user preferences across different item types (namely, top-wear, bottom-wear and foot-wear) including price-range of each type and a maximum shopping budget for a particular shopping session.

Recommendation Systems

CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text

1 code implementation22 Oct 2023 Abhilash Nandy, Manav Nitin Kapadnis, Pawan Goyal, Niloy Ganguly

In this paper, we propose CLMSM, a domain-specific, continual pre-training framework, that learns from a large set of procedural recipes.

Contrastive Learning Language Modelling +2

GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning

1 code implementation29 Jul 2023 Soumyadeep Roy, Jonas Wallat, Sowmya S Sundaram, Wolfgang Nejdl, Niloy Ganguly

Large-scale language models such as DNABert and LOGO aim to learn optimal gene representations and are trained on the entire Human Reference Genome.

Few-Shot Learning Language Modeling +2

$FastDoc$: Domain-Specific Fast Continual Pre-training Technique using Document-Level Metadata and Taxonomy

1 code implementation9 Jun 2023 Abhilash Nandy, Manav Nitin Kapadnis, Sohan Patnaik, Yash Parag Butala, Pawan Goyal, Niloy Ganguly

In this paper, we propose $FastDoc$ (Fast Continual Pre-training Technique using Document Level Metadata and Taxonomy), a novel, compute-efficient framework that utilizes Document metadata and Domain-Specific Taxonomy as supervision signals to continually pre-train transformer encoder on a domain-specific corpus.


CrysMMNet: Multimodal Representation for Crystal Property Prediction

1 code implementation9 Jun 2023 Kishalay Das, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, Niloy Ganguly

In this work, we leverage textual descriptions of materials to model global structural information into graph structure and learn a more robust and enriched representation of crystalline materials.

Property Prediction

Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging

no code implementations6 Jun 2023 Soumya Sharma, Subhendu Khatuya, Manjunath Hegde, Afreen Shaikh. Koustuv Dasgupta, Pawan Goyal, Niloy Ganguly

The U. S. Securities and Exchange Commission (SEC) mandates all public companies to file periodic financial statements that should contain numerals annotated with a particular label from a taxonomy.

Benchmarking Sentence

FinRED: A Dataset for Relation Extraction in Financial Domain

1 code implementation6 Jun 2023 Soumya Sharma, Tapas Nayak, Arusarka Bose, Ajay Kumar Meena, Koustuv Dasgupta, Niloy Ganguly, Pawan Goyal

Relation extraction models trained on a source domain cannot be applied on a different target domain due to the mismatch between relation sets.

Financial Relation Extraction Relation +1

IVP-VAE: Modeling EHR Time Series with Initial Value Problem Solvers

2 code implementations11 May 2023 Jingge Xiao, Leonie Basso, Wolfgang Nejdl, Niloy Ganguly, Sandipan Sikdar

Continuous-time models such as Neural ODEs and Neural Flows have shown promising results in analyzing irregularly sampled time series frequently encountered in electronic health records.

Decoder Time Series

A Review of the Role of Causality in Developing Trustworthy AI Systems

1 code implementation14 Feb 2023 Niloy Ganguly, Dren Fazlija, Maryam Badar, Marco Fisichella, Sandipan Sikdar, Johanna Schrader, Jonas Wallat, Koustav Rudra, Manolis Koubarakis, Gourab K. Patro, Wadhah Zai El Amri, Wolfgang Nejdl

This review aims to provide the reader with an overview of causal methods that have been developed to improve the trustworthiness of AI models.

CrysGNN : Distilling pre-trained knowledge to enhance property prediction for crystalline materials

1 code implementation14 Jan 2023 Kishalay Das, Bidisha Samanta, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, Niloy Ganguly

To leverage these untapped data, this paper presents CrysGNN, a new pre-trained GNN framework for crystalline materials, which captures both node and graph level structural information of crystal graphs using a huge amount of unlabelled material data.

Formation Energy Graph Neural Network +1

What You Like: Generating Explainable Topical Recommendations for Twitter Using Social Annotations

no code implementations23 Dec 2022 Parantapa Bhattacharya, Saptarshi Ghosh, Muhammad Bilal Zafar, Soumya K. Ghosh, Niloy Ganguly

With over 500 million tweets posted per day, in Twitter, it is difficult for Twitter users to discover interesting content from the deluge of uninteresting posts.

Collaborative Filtering Recommendation Systems

ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts

1 code implementation22 Oct 2022 Rajdeep Mukherjee, Abhinav Bohra, Akash Banerjee, Soumya Sharma, Manjunath Hegde, Afreen Shaikh, Shivani Shrivastava, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

Despite tremendous progress in automatic summarization, state-of-the-art methods are predominantly trained to excel in summarizing short newswire articles, or documents with strong layout biases such as scientific articles or government reports.

An Application to Generate Style Guided Compatible Outfit

1 code implementation2 May 2022 Debopriyo Banerjee, Harsh Maheshwari, Lucky Dhakad1, Arnab Bhattacharya1, Niloy Ganguly, Muthusamy Chelliah, Suyash Agarwal1

Fashion recommendation has witnessed a phenomenal growth of research, particularly in the domains of shop-the-look, contextaware outfit creation, personalizing outfit creation etc.

Scheduling Virtual Conferences Fairly: Achieving Equitable Participant and Speaker Satisfaction

1 code implementation26 Apr 2022 Gourab K. Patro, Prithwish Jana, Abhijnan Chakraborty, Krishna P. Gummadi, Niloy Ganguly

As the efficiency and fairness objectives can be in conflict with each other, we propose a joint optimization framework that allows conference organizers to design schedules that balance (i. e., allow trade-offs) among efficiency, participant fairness and speaker fairness objectives.

Fairness Scheduling

A Generative Approach for Financial Causality Extraction

1 code implementation12 Apr 2022 Tapas Nayak, Soumya Sharma, Yash Butala, Koustuv Dasgupta, Pawan Goyal, Niloy Ganguly

Causality represents the foremost relation between events in financial documents such as financial news articles, financial reports.


Recommendation of Compatible Outfits Conditioned on Style

no code implementations30 Mar 2022 Debopriyo Banerjee, Lucky Dhakad, Harsh Maheshwari, Muthusamy Chelliah, Niloy Ganguly, Arnab Bhattacharya

Recommendation in the fashion domain has seen a recent surge in research in various areas, for example, shop-the-look, context-aware outfit creation, personalizing outfit creation, etc.

Offsetting Unequal Competition through RL-assisted Incentive Schemes

no code implementations5 Jan 2022 Paramita Koley, Aurghya Maiti, Sourangshu Bhattacharya, Niloy Ganguly

On inspecting, we realize that an overall incentive scheme for the weak team does not incentivize the weaker agents within that team to learn and improve.

Multi-agent Reinforcement Learning reinforcement-learning +1

Towards Fair Recommendation in Two-Sided Platforms

1 code implementation26 Dec 2021 Arpita Biswas, Gourab K Patro, Niloy Ganguly, Krishna P. Gummadi, Abhijnan Chakraborty

Many online platforms today (such as Amazon, Netflix, Spotify, LinkedIn, and AirBnB) can be thought of as two-sided markets with producers and customers of goods and services.

Fairness Vocal Bursts Valence Prediction

A Data Bootstrapping Recipe for Low Resource Multilingual Relation Classification

no code implementations18 Oct 2021 Arijit Nag, Bidisha Samanta, Animesh Mukherjee, Niloy Ganguly, Soumen Chakrabarti

Relation classification (sometimes called 'extraction') requires trustworthy datasets for fine-tuning large language models, as well as for evaluation.

Classification Relation +1

Understanding the Role of Affect Dimensions in Detecting Emotions from Tweets: A Multi-task Approach

1 code implementation9 May 2021 Rajdeep Mukherjee, Atharva Naik, Sriyash Poddar, Soham Dasgupta, Niloy Ganguly

For the regression task, VADEC, when trained with SenWave, achieves 7. 6% and 16. 5% gains in Pearson Correlation scores over the current state-of-the-art on the EMOBANK dataset for the Valence (V) and Dominance (D) affect dimensions respectively.

Emotion Classification regression +1

Convex Online Video Frame Subset Selection using Multiple Criteria for Data Efficient Autonomous Driving

no code implementations24 Mar 2021 Soumi Das, Harikrishna Patibandla, Suparna Bhattacharya, Kshounis Bera, Niloy Ganguly, Sourangshu Bhattacharya

We design a novel convex optimization-based multi-criteria online subset selection algorithm that uses a thresholded concave function of selection variables.

Autonomous Driving

Demarcating Endogenous and Exogenous Opinion Dynamics: An Experimental Design Approach

no code implementations11 Feb 2021 Paramita Koley, Avirup Saha, Sourangshu Bhattacharya, Niloy Ganguly, Abir De

The networked opinion diffusion in online social networks (OSN) is often governed by the two genres of opinions - endogenous opinions that are driven by the influence of social contacts among users, and exogenous opinions which are formed by external effects like news, feeds etc.

Experimental Design

An Integrated Approach for Improving Brand Consistency of Web Content: Modeling, Analysis and Recommendation

1 code implementation19 Nov 2020 Soumyadeep Roy, Shamik Sural, Niyati Chhaya, Anandhavelu Natarajan, Niloy Ganguly

A consumer-dependent (business-to-consumer) organization tends to present itself as possessing a set of human qualities, which is termed as the brand personality of the company.

Marketing Sentence

On Fair Virtual Conference Scheduling: Achieving Equitable Participant and Speaker Satisfaction

no code implementations24 Oct 2020 Gourab K Patro, Abhijnan Chakraborty, Niloy Ganguly, Krishna P. Gummadi

We show that the welfare and fairness objectives can be in conflict with each other, and there is a need to maintain a balance between these objective while caring for them simultaneously.

Fairness Scheduling

Fine-grained Sentiment Controlled Text Generation

no code implementations17 Jun 2020 Bidisha Samanta, Mohit Agarwal, Niloy Ganguly

DE-VAE achieves better control of sentiment as an attribute while preserving the content by learning a suitable lossless transformation network from the disentangled sentiment space to the desired entangled representation.

Attribute Text Generation

FairRec: Two-Sided Fairness for Personalized Recommendations in Two-Sided Platforms

2 code implementations25 Feb 2020 Gourab K Patro, Arpita Biswas, Niloy Ganguly, Krishna P. Gummadi, Abhijnan Chakraborty

We investigate the problem of fair recommendation in the context of two-sided online platforms, comprising customers on one side and producers on the other.

Fairness Vocal Bursts Valence Prediction

Regression Under Human Assistance

1 code implementation6 Sep 2019 Abir De, Nastaran Okati, Paramita Koley, Niloy Ganguly, Manuel Gomez-Rodriguez

In this paper, we take a first step towards the development of machine learning models that are optimized to operate under different automation levels.

BIG-bench Machine Learning Medical Diagnosis +1

Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs

1 code implementation IJCNLP 2019 Soumya Sharma, Bishal Santra, Abhik Jana, T. Y. S. S. Santosh, Niloy Ganguly, Pawan Goyal

Specifically, we experiment with fusing embeddings obtained from knowledge graph with the state-of-the-art approaches for NLI task (ESIM model).

Knowledge Graphs

A Deep Generative Model for Code-Switched Text

1 code implementation21 Jun 2019 Bidisha Samanta, Sharmila Reddy, Hussain Jagirdar, Niloy Ganguly, Soumen Chakrabarti

Code-switching, the interleaving of two or more languages within a sentence or discourse is pervasive in multilingual societies.


AttentiveChecker: A Bi-Directional Attention Flow Mechanism for Fact Verification

no code implementations NAACL 2019 Santosh Tokala, Vishal G, Avirup Saha, Niloy Ganguly

The recently released FEVER dataset provided benchmark results on a fact-checking task in which given a factual claim, the system must extract textual evidence (sets of sentences from Wikipedia pages) that support or refute the claim.

Claim Verification Fact Checking +3

NeVAE: A Deep Generative Model for Molecular Graphs

2 code implementations14 Feb 2018 Bidisha Samanta, Abir De, Gourhari Jana, Pratim Kumar Chattaraj, Niloy Ganguly, Manuel Gomez-Rodriguez

Moreover, in contrast with the state of the art, our decoder is able to provide the spatial coordinates of the atoms of the molecules it generates.

Bayesian Optimization Decoder +1

A graphical framework to detect and categorize diverse opinions from online news

no code implementations WS 2016 Ankan Mullick, Pawan Goyal, Niloy Ganguly

This paper proposes a graphical framework to extract opinionated sentences which highlight different contexts within a given news article by introducing the concept of diversity in a graphical model for opinion detection. We conduct extensive evaluations and find that the proposed modification leads to impressive improvement in performance and makes the final results of the model much more usable.

Diversity General Classification +1

Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments

no code implementations LREC 2016 Rafiya Begum, Kalika Bali, Monojit Choudhury, Koustav Rudra, Niloy Ganguly

Code-Switching (CS) between two languages is extremely common in communities with societal multilingualism where speakers switch between two or more languages when interacting with each other.

Discriminative Link Prediction using Local Links, Node Features and Community Structure

no code implementations17 Oct 2013 Abir De, Niloy Ganguly, Soumen Chakrabarti

Apart from the new predictor, another contribution is a rigorous protocol for benchmarking and reporting LP algorithms, which reveals the regions of strengths and weaknesses of all the predictors studied here, and establishes the new proposal as the most robust.

Benchmarking Clustering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.