no code implementations • WASSA (ACL) 2022 • Shenbin Qian, Constantin Orasan, Diptesh Kanojia, Hadeel Saadany, Félix do Carmo
This paper summarises the submissions our team, SURREY-CTS-NLP has made for the WASSA 2022 Shared Task for the prediction of empathy, distress and emotion.
1 code implementation • EMNLP 2021 • Anirudh Mittal, Pranav Jeevan P, Prerak Gandhi, Diptesh Kanojia, Pushpak Bhattacharyya
The normalized duration (laughter duration divided by the clip duration) of laughter in each clip is used to compute this humour coefficient score on a five-point scale (0-4).
no code implementations • GWC 2016 • Diptesh Kanojia, Raj Dabre, Pushpak Bhattacharyya
India is a country with 22 officially recognized languages and 17 of these have WordNets, a crucial resource.
no code implementations • GWC 2016 • Diptesh Kanojia, Shehzaad Dhuliawala, Pushpak Bhattacharyya
Our contribution is three fold: (1) We develop a system, which, given a synset in English, finds an appropriate image for the synset.
no code implementations • GWC 2016 • Meghna Singh, Rajita Shukla, Jaya Saraswati, Laxmi Kashyap, Diptesh Kanojia, Pushpak Bhattacharyya
This paper reports the work of creating bilingual mappings in English for certain synsets of Hindi wordnet, the need for doing this, the methods adopted and the tools created for the task.
no code implementations • GWC 2018 • Hanumant Redkar, Rajita Shukla, Sandhya Singh, Jaya Saraswati, Laxmi Kashyap, Diptesh Kanojia, Preethi Jyothi, Malhar Kulkarni, Pushpak Bhattacharyya
This aid is based on modern pedagogical axioms and is aligned to the learning objectives of the syllabi of the school education in India.
no code implementations • GWC 2018 • Diptesh Kanojia, Preethi Jyothi, Pushpak Bhattacharyya
We also develop voices using the existing implementations of the aforementioned systems, and (2) We use these voices to generate sample audios for randomly chosen words; manually evaluate the audio generated, and produce audio for all WordNet words using the winner voice model.
no code implementations • GWC 2018 • Ritesh Panjwani, Diptesh Kanojia, Pushpak Bhattacharyya
Indian language WordNets have their individual web-based browsing interfaces along with a common interface for IndoWordNet.
1 code implementation • 20 Mar 2025 • Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo
The generated Chinese homophones, along with their manual translations, are utilized to generate perturbations and to probe the robustness of existing quality evaluation models, including models trained using multi-task learning, fine-tuned variants of multilingual language models, as well as large language models (LLMs).
no code implementations • 25 Feb 2025 • Xinran Liu, Xu Dong, Diptesh Kanojia, Wenwu Wang, ZhenHua Feng
To overcome these challenges, we propose GCDance, a classifier-free diffusion framework for generating genre-specific dance motions conditioned on both music and textual prompts.
1 code implementation • 11 Feb 2025 • Girish A. Koushik, Diptesh Kanojia, Helen Treharne
This paper presents a systematic analysis of fusion-based approaches for multimodal hate detection, focusing on their performance across video and image-based content.
Ranked #1 on
Hate Speech Detection
on HateMM
no code implementations • 28 Jan 2025 • Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya
Automatic Post-Editing (APE) systems often struggle with over-correction, where unnecessary modifications are made to a translation, diverging from the principle of minimal editing.
no code implementations • 8 Jan 2025 • Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Shenbin Qian
This paper investigates the reference-less evaluation of machine translation for low-resource language pairs, known as quality estimation (QE).
no code implementations • 10 Dec 2024 • Fatemeh Nazarieh, ZhenHua Feng, Diptesh Kanojia, Muhammad Awais, Josef Kittler
Audio-driven talking face generation is a challenging task in digital communication.
no code implementations • 6 Dec 2024 • Dipankar Srirag, Aditya Joshi, Jordan Painter, Diptesh Kanojia
Subsequently, we fine-tune nine large language models (LLMs) (representing a range of encoder/decoder and mono/multilingual models) on these datasets, and evaluate their performance on the two tasks.
no code implementations • 24 Oct 2024 • Shafkat Farabi, Tharindu Ranasinghe, Diptesh Kanojia, Yu Kong, Marcos Zampieri
In this paper, we present the first comprehensive survey on multimodal sarcasm detection - henceforth MSD - to date.
no code implementations • 23 Oct 2024 • Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya
This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages.
no code implementations • 21 Oct 2024 • Hadeel Saadany, Swapnil Bhosale, Samarth Agrawal, Diptesh Kanojia, Constantin Orasan, Zhe Wu
We introduce a User-intent Centrality Optimization (UCO) approach for existing models, which optimises for the user intent in semantic product search.
no code implementations • 15 Oct 2024 • Dipankar Srirag, Jordan Painter, Aditya Joshi, Diptesh Kanojia
Existing benchmarks often fail to account for linguistic diversity, like language variants of English.
1 code implementation • 8 Oct 2024 • Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo
This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC) that contains emotional expressions, without the use of reference translations.
1 code implementation • 8 Oct 2024 • Félix do Carmo, Diptesh Kanojia
The tutorial describes the concept of edit distances applied to research and commercial contexts.
1 code implementation • 4 Oct 2024 • Shenbin Qian, Archchana Sindhujan, Minnie Kabra, Diptesh Kanojia, Constantin Orăsan, Tharindu Ranasinghe, Frédéric Blain
For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models.
no code implementations • 4 Oct 2024 • Shenbin Qian, Constantin Orăsan, Diptesh Kanojia, Félix do Carmo
We extend it with sentence-level evaluation scores and word-level labels, leading to a dataset suitable for sentence- and word-level translation evaluation and emotion classification, in a multi-task setting.
no code implementations • 19 Sep 2024 • Aditya Joshi, Diptesh Kanojia, Heather Lent, Hour Kaing, Haiyue Song
Despite excellent results on benchmarks over a small subset of languages, large language models struggle to process text from languages situated in `lower-resource' scenarios such as dialects/sociolects (national or social varieties of a language), Creoles (languages arising from linguistic contact between multiple languages) and other low-resource languages.
no code implementations • 13 Jun 2024 • Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu
To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the listener and sound source.
no code implementations • 21 Mar 2024 • Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu
Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound.
no code implementations • 6 Feb 2024 • Jaleh Delfani, Constantin Orasan, Hadeel Saadany, Ozlem Temizoz, Eleanor Taylor-Stilgoe, Diptesh Kanojia, Sabine Braun, Barbara Schouten
This study explores the use of Google Translate (GT) for translating mental healthcare (MHealth) information and evaluates its accuracy, comprehensibility, and implications for multilingual healthcare communication through analysing GT output in the MHealth domain from English to Persian, Arabic, Turkish, Romanian, and Spanish.
1 code implementation • 26 Jan 2024 • Jay Gala, Thanmay Jayakumar, Jaavid Aktar Husain, Aswanth Kumar M, Mohammed Safi Ur Rahman Khan, Diptesh Kanojia, Ratish Puduppully, Mitesh M. Khapra, Raj Dabre, Rudra Murthy, Anoop Kunchukuttan
We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi.
no code implementations • 11 Jan 2024 • Aditya Joshi, Raj Dabre, Diptesh Kanojia, Zhuang Li, Haolan Zhan, Gholamreza Haffari, Doris Dippold
Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
no code implementations • 18 Dec 2023 • Akshay Batheja, Sourabh Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya
We propose a repair-filter-use methodology that uses an APE system to correct errors on the target side of the MT training data.
no code implementations • 1 Dec 2023 • Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Tharindu Ranasinghe
Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is no reference available.
1 code implementation • 30 Oct 2023 • Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, Li Zhou, Ruth-Ann Armstrong, Abee Eijansantos, Catriona Malau, Hans Erik Heje, Ernests Lavrinovics, Diptesh Kanojia, Paul Belony, Marcel Bollmann, Loïc Grobol, Miryam de Lhoneux, Daniel Hershcovich, Michel DeGraff, Anders Søgaard, Johannes Bjerva
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data.
no code implementations • 29 Sep 2023 • Swapnil Bhosale, Abhra Chaudhuri, Alex Lee Robert Williams, Divyank Tiwari, Anjan Dutta, Xiatian Zhu, Pushpak Bhattacharyya, Diptesh Kanojia
The introduction of the MUStARD dataset, and its emotion recognition extension MUStARD++, have identified sarcasm to be a multi-modal phenomenon -- expressed not only in natural language text, but also through manners of speech (like tonality and intonation) and visual cues (facial expression).
no code implementations • 13 Sep 2023 • Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Xiatian Zhu
Particularly, in situations where existing supervised AVS methods struggle with overlapping foreground objects, our models still excel in accurately segmenting overlapped auditory objects.
no code implementations • 14 Aug 2023 • Swapnil Bhosale, Sauradip Nag, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu
In this work, we reformulate the SED problem by taking a generative learning perspective.
1 code implementation • 20 Jun 2023 • Shenbin Qian, Constantin Orasan, Felix Do Carmo, Qiuliang Li, Diptesh Kanojia
In this paper, we focus on how current Machine Translation (MT) tools perform on the translation of emotion-loaded texts by evaluating outputs from Google Translate according to a framework proposed in this paper.
no code implementations • 24 Jan 2023 • Diptesh Kanojia, Aditya Joshi
Sentiment analysis has benefited from the availability of lexicons and benchmark datasets created over decades of research.
1 code implementation • COLING 2022 • Varad Bhatnagar, Diptesh Kanojia, Kameswari Chebrolu
We propose a new workflow for efficiently detecting previously fact-checked claims that uses abstractive summarization to generate crisp queries.
1 code implementation • LREC 2022 • Rudra Murthy, Pallab Bhattacharjee, Rahul Sharnagat, Jyotsana Khatri, Diptesh Kanojia, Pushpak Bhattacharyya
We use different language models to perform the sequence labelling task for NER and show the efficacy of our data by performing a comparative evaluation with models trained on another dataset available for the Hindi NER task.
Ranked #1 on
Named Entity Recognition (NER)
on HiNER-original
1 code implementation • LREC 2022 • Leonardo Zilio, Hadeel Saadany, Prashant Sharma, Diptesh Kanojia, Constantin Orăsan
This paper presents PLOD, a large-scale dataset for abbreviation detection and extraction that contains 160k+ segments automatically annotated with abbreviations and their long forms.
Ranked #1 on
AbbreviationDetection
on PLOD-unfiltered
1 code implementation • 9 Jan 2022 • Prashant Sharma, Hadeel Saadany, Leonardo Zilio, Diptesh Kanojia, Constantin Orăsan
Acronyms are abbreviated units of a phrase constructed by using initial components of the phrase in a text.
no code implementations • LREC 2018 • Diptesh Kanojia, Kevin Patel, Pushpak Bhattacharyya
Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of different languages.
no code implementations • 5 Jan 2022 • Diptesh Kanojia, Malhar Kulkarni, Sayali Ghodekar, Eivind Kahrs, Pushpak Bhattacharyya
We use the text of the K\=a\'sik\=avrtti (KV) as a sample text, and with the help of philologists, we digitize the commentaries available to us.
no code implementations • GWC 2018 • Kevin Patel, Diptesh Kanojia, Pushpak Bhattacharyya
Thus techniques that can aid the experts are desirable.
no code implementations • 5 Jan 2022 • Swaraja Salaskar, Diptesh Kanojia, Malhar Kulkarni
Our paper attempts to show the implication of the creation of our tool in this area.
no code implementations • GWC 2019 • Diptesh Kanojia, Kevin Patel, Pushpak Bhattacharyya, Malhar Kulkarni, Gholamreza Haffari
Automatic Cognate Detection (ACD) is a challenging task which has been utilized to help NLP applications like Machine Translation, Information Retrieval and Computational Phylogenetics.
no code implementations • 27 Dec 2021 • Kumar Saurav, Kumar Saunack, Diptesh Kanojia, Pushpak Bhattacharyya
In this paper, we use various existing approaches to create multiple word embeddings for 14 Indian languages.
no code implementations • 21 Dec 2021 • Sandeep Mathias, Diptesh Kanojia, Abhijit Mishra, Pushpak Bhattacharyya
Gaze behaviour has been used as a way to gather cognitive information for a number of years.
1 code implementation • LREC 2020 • Diptesh Kanojia, Pushpak Bhattacharyya, Malhar Kulkarni, Gholamreza Haffari
In this paper, we describe the creation of two cognate datasets for twelve Indian languages, namely Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam.
1 code implementation • COLING 2020 • Diptesh Kanojia, Raj Dabre, Shubham Dewangan, Pushpak Bhattacharyya, Gholamreza Haffari, Malhar Kulkarni
We, then, evaluate the impact of our cognate detection mechanism on neural machine translation (NMT), as a downstream task.
Cross-Lingual Information Retrieval
Cross-Lingual Word Embeddings
+5
1 code implementation • EACL 2021 • Diptesh Kanojia, Prashant Sharma, Sayali Ghodekar, Pushpak Bhattacharyya, Gholamreza Haffari, Malhar Kulkarni
We collect gaze behaviour data for a small sample of cognates and show that extracted cognitive features help the task of cognate detection.
1 code implementation • ICON 2021 • Mrinal Rawat, Diptesh Kanojia
The results show that our approach outperforms the state-of-the-art methods in fake news detection to achieve an F1-score of 99. 25 over the dataset provided for the CONSTRAINT-2021 Shared Task.
1 code implementation • 25 Oct 2021 • Anirudh Mittal, Pranav Jeevan, Prerak Gandhi, Diptesh Kanojia, Pushpak Bhattacharyya
We devise a novel scoring mechanism to annotate the training data with a humour quotient score using the audience's laughter.
1 code implementation • WMT (EMNLP) 2021 • Diptesh Kanojia, Marina Fomicheva, Tharindu Ranasinghe, Frédéric Blain, Constantin Orăsan, Lucia Specia
However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements.
no code implementations • ICON 2020 • Sandeep Mathias, Rudra Murthy, Diptesh Kanojia, Pushpak Bhattacharyya
Automatic essay grading (AEG) is a process in which machines assign a grade to an essay written in response to a topic, called the prompt.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Sandeep Mathias, Rudra Murthy, Diptesh Kanojia, Abhijit Mishra, Pushpak Bhattacharyya
To demonstrate the efficacy of this multi-task learning based approach to automatic essay grading, we collect gaze behaviour for 48 essays across 4 essay sets, and learn gaze behaviour for the rest of the essays, numbering over 7000 essays.
no code implementations • LREC 2020 • Saurav Kumar, Saunack Kumar, Diptesh Kanojia, Pushpak Bhattacharyya
In this paper, we use various existing approaches to create multiple word embeddings for 14 Indian languages.
no code implementations • LREC 2020 • Akash Sheoran, Diptesh Kanojia, Aditya Joshi, Pushpak Bhattacharyya
Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain (known as the target domain) is unavailable or insufficient.
no code implementations • 9 Apr 2020 • Akash Sheoran, Diptesh Kanojia, Aditya Joshi, Pushpak Bhattacharyya
Cross-domain sentiment analysis (CDSA) helps to address the problem of data scarcity in scenarios where labelled data for a domain (known as the target domain) is unavailable or insufficient.
no code implementations • ACL 2018 • Sandeep Mathias, Diptesh Kanojia, Kevin Patel, Samarth Agarwal, Abhijit Mishra, Pushpak Bhattacharyya
Such subjective aspects are better handled using cognitive information.
no code implementations • 10 Oct 2018 • Jayashree Gajjam, Diptesh Kanojia, Malhar Kulkarni
The notions of a sentence and a word as a meaningful linguistic unit in the language have been a subject matter for the discussion in many works that followed later on.
no code implementations • WS 2017 • Diptesh Kanojia, Nikhil Wani, Pushpak Bhattacharyya
We present a quantitative, data-driven machine learning approach to mitigate the problem of unpredictability of Computer Science Graduate School Admissions.
no code implementations • CONLL 2016 • Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya
Sentiments expressed in user-generated short text and sentences are nuanced by subtleties at lexical, syntactic, semantic and pragmatic levels.
no code implementations • ACL 2016 • Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya
In this paper, we propose a novel mechanism for enriching the feature vector, for the task of sarcasm detection, with cognitive features extracted from eye-movement patterns of human readers.
no code implementations • 14 Oct 2016 • Diptesh Kanojia, Vishwajeet Kumar, Krithi Ramamritham
We present the Civique system for emergency detection in urban areas by monitoring micro blogs like Tweets.
no code implementations • LREC 2016 • Shehzaad Dhuliawala, Diptesh Kanojia, Pushpak Bhattacharyya
We present a WordNet like structured resource for slang words and neologisms on the internet.
no code implementations • LREC 2016 • Diptesh Kanojia, Aditya Joshi, Pushpak Bhattacharyya, Mark James Carman
As demonstrated by the quality of our coarse lexical resource and its benefit to MT, we believe that our sentential approach to create such a resource will help MT for resource-constrained languages.