Search Results for author: Diptesh Kanojia

Found 76 papers, 20 papers with code

“So You Think You’re Funny?”: Rating the Humour Quotient in Standup Comedy

1 code implementation EMNLP 2021 Anirudh Mittal, Pranav Jeevan P, Prerak Gandhi, Diptesh Kanojia, Pushpak Bhattacharyya

The normalized duration (laughter duration divided by the clip duration) of laughter in each clip is used to compute this humour coefficient score on a five-point scale (0-4).

Mapping it differently: A solution to the linking challenges

no code implementations GWC 2016 Meghna Singh, Rajita Shukla, Jaya Saraswati, Laxmi Kashyap, Diptesh Kanojia, Pushpak Bhattacharyya

This paper reports the work of creating bilingual mappings in English for certain synsets of Hindi wordnet, the need for doing this, the methods adopted and the tools created for the task.

Information Retrieval Retrieval +4

Synthesizing Audio for Hindi WordNet

no code implementations GWC 2018 Diptesh Kanojia, Preethi Jyothi, Pushpak Bhattacharyya

We also develop voices using the existing implementations of the aforementioned systems, and (2) We use these voices to generate sample audios for randomly chosen words; manually evaluate the audio generated, and produce audio for all WordNet words using the winner voice model.

Speech Synthesis

pyiwn: A Python based API to access Indian Language WordNets

no code implementations GWC 2018 Ritesh Panjwani, Diptesh Kanojia, Pushpak Bhattacharyya

Indian language WordNets have their individual web-based browsing interfaces along with a common interface for IndoWordNet.

Speech Synthesis

Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems

1 code implementation20 Mar 2025 Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

The generated Chinese homophones, along with their manual translations, are utilized to generate perturbations and to probe the robustness of existing quality evaluation models, including models trained using multi-task learning, fine-tuned variants of multilingual language models, as well as large language models (LLMs).

Machine Translation Multi-Task Learning

GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music

no code implementations25 Feb 2025 Xinran Liu, Xu Dong, Diptesh Kanojia, Wenwu Wang, ZhenHua Feng

To overcome these challenges, we propose GCDance, a classifier-free diffusion framework for generating genre-specific dance motions conditioned on both music and textual prompts.

Rhythm

Towards a Robust Framework for Multimodal Hate Detection: A Study on Video vs. Image-based Content

1 code implementation11 Feb 2025 Girish A. Koushik, Diptesh Kanojia, Helen Treharne

This paper presents a systematic analysis of fusion-based approaches for multimodal hate detection, focusing on their performance across video and image-based content.

Hate Speech Detection Video Classification

Giving the Old a Fresh Spin: Quality Estimation-Assisted Constrained Decoding for Automatic Post-Editing

no code implementations28 Jan 2025 Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya

Automatic Post-Editing (APE) systems often struggle with over-correction, where unnecessary modifications are made to a translation, diverging from the principle of minimal editing.

Automatic Post-Editing

When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages

no code implementations8 Jan 2025 Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Shenbin Qian

This paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality estimation (QE).

Transliteration

BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

no code implementations6 Dec 2024 Dipankar Srirag, Aditya Joshi, Jordan Painter, Diptesh Kanojia

Subsequently, we fine-tune nine large language models (LLMs) (representing a range of encoder/decoder and mono/multilingual models) on these datasets, and evaluate their performance on the two tasks.

Sarcasm Detection Sentiment Analysis

A Survey of Multimodal Sarcasm Detection

no code implementations24 Oct 2024 Shafkat Farabi, Tharindu Ranasinghe, Diptesh Kanojia, Yu Kong, Marcos Zampieri

In this paper, we present the first comprehensive survey on multimodal sarcasm detection - henceforth MSD - to date.

Sarcasm Detection Survey

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

no code implementations23 Oct 2024 Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya

This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages.

Automatic Post-Editing Data Augmentation +2

Centrality-aware Product Retrieval and Ranking

no code implementations21 Oct 2024 Hadeel Saadany, Swapnil Bhosale, Samarth Agrawal, Diptesh Kanojia, Constantin Orasan, Zhe Wu

We introduce a User-intent Centrality Optimization (UCO) approach for existing models, which optimises for the user intent in semantic product search.

Retrieval

Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?

1 code implementation8 Oct 2024 Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC) that contains emotional expressions, without the use of reference translations.

In-Context Learning Machine Translation +2

Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts

1 code implementation8 Oct 2024 Félix do Carmo, Diptesh Kanojia

The tutorial describes the concept of edit distances applied to research and commercial contexts.

Translation

What do Large Language Models Need for Machine Translation Evaluation?

1 code implementation4 Oct 2024 Shenbin Qian, Archchana Sindhujan, Minnie Kabra, Diptesh Kanojia, Constantin Orăsan, Tharindu Ranasinghe, Frédéric Blain

For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models.

Machine Translation Translation

A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content

no code implementations4 Oct 2024 Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo

We extend it with sentence-level evaluation scores and word-level labels, leading to a dataset suitable for sentence- and word-level translation evaluation and emotion classification, in a multi-task setting.

Emotion Classification Machine Translation +3

Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios

no code implementations19 Sep 2024 Aditya Joshi, Diptesh Kanojia, Heather Lent, Hour Kaing, Haiyue Song

Despite excellent results on benchmarks over a small subset of languages, large language models struggle to process text from languages situated in `lower-resource' scenarios such as dialects/sociolects (national or social varieties of a language), Creoles (languages arising from linguistic contact between multiple languages) and other low-resource languages.

AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

no code implementations13 Jun 2024 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu

To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the listener and sound source.

Audio Synthesis NeRF

Unsupervised Audio-Visual Segmentation with Modality Alignment

no code implementations21 Mar 2024 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu

Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound.

Contrastive Learning

Google Translate Error Analysis for Mental Healthcare Information: Evaluating Accuracy, Comprehensibility, and Implications for Multilingual Healthcare Communication

no code implementations6 Feb 2024 Jaleh Delfani, Constantin Orasan, Hadeel Saadany, Ozlem Temizoz, Eleanor Taylor-Stilgoe, Diptesh Kanojia, Sabine Braun, Barbara Schouten

This study explores the use of Google Translate (GT) for translating mental healthcare (MHealth) information and evaluates its accuracy, comprehensibility, and implications for multilingual healthcare communication through analysing GT output in the MHealth domain from English to Persian, Arabic, Turkish, Romanian, and Spanish.

Translation

Natural Language Processing for Dialects of a Language: A Survey

no code implementations11 Jan 2024 Aditya Joshi, Raj Dabre, Diptesh Kanojia, Zhuang Li, Haolan Zhan, Gholamreza Haffari, Doris Dippold

Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.

Attribute Machine Translation +5

SurreyAI 2023 Submission for the Quality Estimation Shared Task

no code implementations1 Dec 2023 Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Tharindu Ranasinghe

Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is no reference available.

Sentence

CreoleVal: Multilingual Multitask Benchmarks for Creoles

1 code implementation30 Oct 2023 Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, Li Zhou, Ruth-Ann Armstrong, Abee Eijansantos, Catriona Malau, Hans Erik Heje, Ernests Lavrinovics, Diptesh Kanojia, Paul Belony, Marcel Bollmann, Loïc Grobol, Miryam de Lhoneux, Daniel Hershcovich, Michel DeGraff, Anders Søgaard, Johannes Bjerva

Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data.

Machine Translation Reading Comprehension +2

Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection

no code implementations29 Sep 2023 Swapnil Bhosale, Abhra Chaudhuri, Alex Lee Robert Williams, Divyank Tiwari, Anjan Dutta, Xiatian Zhu, Pushpak Bhattacharyya, Diptesh Kanojia

The introduction of the MUStARD dataset, and its emotion recognition extension MUStARD++, have identified sarcasm to be a multi-modal phenomenon -- expressed not only in natural language text, but also through manners of speech (like tonality and intonation) and visual cues (facial expression).

Benchmarking Diversity +2

Leveraging Foundation models for Unsupervised Audio-Visual Segmentation

no code implementations13 Sep 2023 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Xiatian Zhu

Particularly, in situations where existing supervised AVS methods struggle with overlapping foreground objects, our models still excel in accurately segmenting overlapped auditory objects.

Segmentation

Evaluation of Chinese-English Machine Translation of Emotion-Loaded Microblog Texts: A Human Annotated Dataset for the Quality Assessment of Emotion Translation

1 code implementation20 Jun 2023 Shenbin Qian, Constantin Orasan, Felix Do Carmo, Qiuliang Li, Diptesh Kanojia

In this paper, we focus on how current Machine Translation (MT) tools perform on the translation of emotion-loaded texts by evaluating outputs from Google Translate according to a framework proposed in this paper.

Machine Translation Negation +1

Applications and Challenges of Sentiment Analysis in Real-life Scenarios

no code implementations24 Jan 2023 Diptesh Kanojia, Aditya Joshi

Sentiment analysis has benefited from the availability of lexicons and benchmark datasets created over decades of research.

Selection bias Sentiment Analysis

Harnessing Abstractive Summarization for Fact-Checked Claim Detection

1 code implementation COLING 2022 Varad Bhatnagar, Diptesh Kanojia, Kameswari Chebrolu

We propose a new workflow for efficiently detecting previously fact-checked claims that uses abstractive summarization to generate crisp queries.

Abstractive Text Summarization Fact Checking +2

HiNER: A Large Hindi Named Entity Recognition Dataset

1 code implementation LREC 2022 Rudra Murthy, Pallab Bhattacharjee, Rahul Sharnagat, Jyotsana Khatri, Diptesh Kanojia, Pushpak Bhattacharyya

We use different language models to perform the sequence labelling task for NER and show the efficacy of our data by performing a comparative evaluation with models trained on another dataset available for the Hindi NER task.

named-entity-recognition Named Entity Recognition +2

PLOD: An Abbreviation Detection Dataset for Scientific Documents

1 code implementation LREC 2022 Leonardo Zilio, Hadeel Saadany, Prashant Sharma, Diptesh Kanojia, Constantin Orăsan

This paper presents PLOD, a large-scale dataset for abbreviation detection and extraction that contains 160k+ segments automatically annotated with abbreviations and their long forms.

AbbreviationDetection Information Retrieval +3

Indian Language Wordnets and their Linkages with Princeton WordNet

no code implementations LREC 2018 Diptesh Kanojia, Kevin Patel, Pushpak Bhattacharyya

Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of different languages.

Strategies of Effective Digitization of Commentaries and Sub-commentaries: Towards the Construction of Textual History

no code implementations5 Jan 2022 Diptesh Kanojia, Malhar Kulkarni, Sayali Ghodekar, Eivind Kahrs, Pushpak Bhattacharyya

We use the text of the K\=a\'sik\=avrtti (KV) as a sample text, and with the help of philologists, we digitize the commentaries available to us.

Utilizing Wordnets for Cognate Detection among Indian Languages

no code implementations GWC 2019 Diptesh Kanojia, Kevin Patel, Pushpak Bhattacharyya, Malhar Kulkarni, Gholamreza Haffari

Automatic Cognate Detection (ACD) is a challenging task which has been utilized to help NLP applications like Machine Translation, Information Retrieval and Computational Phylogenetics.

Information Retrieval Machine Translation +1

Challenge Dataset of Cognates and False Friend Pairs from Indian Languages

1 code implementation LREC 2020 Diptesh Kanojia, Pushpak Bhattacharyya, Malhar Kulkarni, Gholamreza Haffari

In this paper, we describe the creation of two cognate datasets for twelve Indian languages, namely Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam.

Information Retrieval Machine Translation +2

Automated Evidence Collection for Fake News Detection

1 code implementation ICON 2021 Mrinal Rawat, Diptesh Kanojia

The results show that our approach outperforms the state-of-the-art methods in fake news detection to achieve an F1-score of 99. 25 over the dataset provided for the CONSTRAINT-2021 Shared Task.

Fake News Detection Misinformation

"So You Think You're Funny?": Rating the Humour Quotient in Standup Comedy

1 code implementation25 Oct 2021 Anirudh Mittal, Pranav Jeevan, Prerak Gandhi, Diptesh Kanojia, Pushpak Bhattacharyya

We devise a novel scoring mechanism to annotate the training data with a humour quotient score using the audience's laughter.

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

1 code implementation WMT (EMNLP) 2021 Diptesh Kanojia, Marina Fomicheva, Tharindu Ranasinghe, Frédéric Blain, Constantin Orăsan, Lucia Specia

However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements.

Machine Translation Translation

Cognitively Aided Zero-Shot Automatic Essay Grading

no code implementations ICON 2020 Sandeep Mathias, Rudra Murthy, Diptesh Kanojia, Pushpak Bhattacharyya

Automatic essay grading (AEG) is a process in which machines assign a grade to an essay written in response to a topic, called the prompt.

Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze Behaviour

1 code implementation Asian Chapter of the Association for Computational Linguistics 2020 Sandeep Mathias, Rudra Murthy, Diptesh Kanojia, Abhijit Mishra, Pushpak Bhattacharyya

To demonstrate the efficacy of this multi-task learning based approach to automatic essay grading, we collect gaze behaviour for 48 essays across 4 essay sets, and learn gaze behaviour for the rest of the essays, numbering over 7000 essays.

Multi-Task Learning Named Entity Recognition (NER) +1

Recommendation Chart of Domains for Cross-Domain Sentiment Analysis: Findings of A 20 Domain Study

no code implementations LREC 2020 Akash Sheoran, Diptesh Kanojia, Aditya Joshi, Pushpak Bhattacharyya

Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain (known as the target domain) is unavailable or insufficient.

Sentence Sentiment Analysis +1

Recommendation Chart of Domains for Cross-Domain Sentiment Analysis:Findings of A 20 Domain Study

no code implementations9 Apr 2020 Akash Sheoran, Diptesh Kanojia, Aditya Joshi, Pushpak Bhattacharyya

Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain (known as the target domain) is unavailable or insufficient.

Sentence Sentiment Analysis +1

New Vistas to study Bhartrhari: Cognitive NLP

no code implementations10 Oct 2018 Jayashree Gajjam, Diptesh Kanojia, Malhar Kulkarni

The notions of a sentence and a word as a meaningful linguistic unit in the language have been a subject matter for the discussion in many works that followed later on.

Sentence

Is your Statement Purposeless? Predicting Computer Science Graduation Admission Acceptance based on Statement Of Purpose

no code implementations WS 2017 Diptesh Kanojia, Nikhil Wani, Pushpak Bhattacharyya

We present a quantitative, data-driven machine learning approach to mitigate the problem of unpredictability of Computer Science Graduate School Admissions.

Leveraging Cognitive Features for Sentiment Analysis

no code implementations CONLL 2016 Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya

Sentiments expressed in user-generated short text and sentences are nuanced by subtleties at lexical, syntactic, semantic and pragmatic levels.

General Classification Sarcasm Detection +1

Harnessing Cognitive Features for Sarcasm Detection

no code implementations ACL 2016 Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya

In this paper, we propose a novel mechanism for enriching the feature vector, for the task of sarcasm detection, with cognitive features extracted from eye-movement patterns of human readers.

Sarcasm Detection Sentence +1

Civique: Using Social Media to Detect Urban Emergencies

no code implementations14 Oct 2016 Diptesh Kanojia, Vishwajeet Kumar, Krithi Ramamritham

We present the Civique system for emergency detection in urban areas by monitoring micro blogs like Tweets.

That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models

no code implementations LREC 2016 Diptesh Kanojia, Aditya Joshi, Pushpak Bhattacharyya, Mark James Carman

As demonstrated by the quality of our coarse lexical resource and its benefit to MT, we believe that our sentential approach to create such a resource will help MT for resource-constrained languages.

Machine Translation Topic Models +1

Cannot find the paper you are looking for? You can Submit a new open access paper.