2 code implementations • 3 Feb 2025 • Vernon Y. H. Toh, Yew Ken Chia, Deepanway Ghosal, Soujanya Poria
To this end, we track the evolution of the GPT-[n] and o-[n] series models on challenging multimodal puzzles, requiring fine-grained visual perception with abstract or algorithmic reasoning.
1 code implementation • 16 Dec 2024 • Qi Sun, Pengfei Hong, Tej Deep Pala, Vernon Toh, U-Xuan Tan, Deepanway Ghosal, Soujanya Poria
Traditional reinforcement learning-based robotic control methods are often task-specific and fail to generalize across diverse environments or unseen objects and instructions.
no code implementations • 17 Oct 2024 • Jinjie Ni, YiFan Song, Deepanway Ghosal, Bo Li, David Junhao Zhang, Xiang Yue, Fuzhao Xue, Zian Zheng, Kaichen Zhang, Mahir Shah, Kabir Jain, Yang You, Michael Shieh
Perceiving and generating diverse modalities are crucial for AI models to effectively learn from and engage with real-world signals, necessitating reliable evaluations for their development.
1 code implementation • 16 Oct 2024 • Vernon Y. H. Toh, Deepanway Ghosal, Soujanya Poria
Large language models (LLMs) have shown increasing competence in solving mathematical reasoning problems.
1 code implementation • 18 Jun 2024 • Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro
It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models.
Ranked #5 on
Audio Generation
on AudioCaps
(CLAP_LAION metric)
1 code implementation • 15 Apr 2024 • Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria
These models do not explicitly focus on the presence of concepts or events and their temporal ordering in the output audio with respect to the input prompt.
2 code implementations • 20 Mar 2024 • Yew Ken Chia, Vernon Toh Yan Han, Deepanway Ghosal, Lidong Bing, Soujanya Poria
To diagnose the reasoning challenges in large multimodal models, we progressively guide the models with our ground truth reasoning explanations for visual perception, inductive reasoning, and deductive reasoning.
2 code implementations • 6 Mar 2024 • Deepanway Ghosal, Vernon Toh Yan Han, Chia Yew Ken, Soujanya Poria
We present a new dataset, AlgoPuzzleVQA designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning.
Ranked #1 on
Multimodal Reasoning
on AlgoPuzzleVQA
1 code implementation • 17 Jan 2024 • Pengfei Hong, Navonil Majumder, Deepanway Ghosal, Somak Aditya, Rada Mihalcea, Soujanya Poria
Recent advancements in Large Language Models (LLMs) have showcased striking results on existing logical reasoning benchmarks, with some models even surpassing human performance.
4 code implementations • 14 Nov 2023 • Jan Melechovsky, Zixun Guo, Deepanway Ghosal, Navonil Majumder, Dorien Herremans, Soujanya Poria
Through extensive experiments, we show that the quality of the music generated by Mustango is state-of-the-art, and the controllability through music-specific text prompts greatly outperforms other models such as MusicGen and AudioLDM2.
Ranked #1 on
Text-to-Music Generation
on MusicBench
1 code implementation • 31 Oct 2023 • Deepanway Ghosal, Navonil Majumder, Roy Ka-Wei Lee, Rada Mihalcea, Soujanya Poria
Visual question answering (VQA) is the task of answering questions about an image.
1 code implementation • 5 Jul 2023 • Deepanway Ghosal, Yew Ken Chia, Navonil Majumder, Soujanya Poria
Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skills.
no code implementations • 19 May 2023 • Deepanway Ghosal, Preksha Nema, Aravindan Raghuveer
The task of table summarization involves generating text that both succinctly and accurately represents the table or a specific set of highlighted cells within a table.
1 code implementation • 24 Apr 2023 • Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria
The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks.
Ranked #5 on
Audio Generation
on AudioCaps
(FAD metric)
1 code implementation • 29 Oct 2022 • Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria
We show the efficacy of our proposed approach in different tasks -- abductive reasoning, commonsense question answering, science question answering, and sentence completion.
Ranked #2 on
Sentence Completion
on HellaSwag
1 code implementation • 6 Oct 2022 • Siqi Shen, Deepanway Ghosal, Navonil Majumder, Henry Lim, Rada Mihalcea, Soujanya Poria
Our results show that the proposed pre-training objectives are effective at adapting the pre-trained T5-Large model for the contextual commonsense inference task.
Ranked #1 on
Multiview Contextual Commonsense Inference
on CICERO
(using extra training data)
no code implementations • 31 Aug 2022 • Deepanway Ghosal, Somak Aditya, Monojit Choudhury
The Natural Language Inference (NLI) task often requires reasoning over multiple steps to reach the conclusion.
1 code implementation • ACL 2022 • Deepanway Ghosal, Siqi Shen, Navonil Majumder, Rada Mihalcea, Soujanya Poria
This paper addresses the problem of dialogue reasoning with contextualized commonsense inference.
Ranked #1 on
Answer Generation
on CICERO
1 code implementation • EMNLP 2021 • Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria
Sentence order prediction is the task of finding the correct order of sentences in a randomly ordered document.
1 code implementation • 22 Jun 2021 • Navonil Majumder, Deepanway Ghosal, Devamanyu Hazarika, Alexander Gelbukh, Rada Mihalcea, Soujanya Poria
We empirically show that these approaches yield significant improvements in empathetic response quality in terms of both automated and human-evaluated metrics.
1 code implementation • SIGDIAL (ACL) 2021 • Deepanway Ghosal, Pengfei Hong, Siqi Shen, Navonil Majumder, Rada Mihalcea, Soujanya Poria
Commonsense inference to understand and explain human language is a fundamental research problem in natural language processing.
1 code implementation • 22 Dec 2020 • Soujanya Poria, Navonil Majumder, Devamanyu Hazarika, Deepanway Ghosal, Rishabh Bhardwaj, Samson Yu Bai Jian, Pengfei Hong, Romila Ghosh, Abhinaba Roy, Niyati Chhaya, Alexander Gelbukh, Rada Mihalcea
We address the problem of recognizing emotion cause in conversations, define two novel sub-tasks of this problem, and provide a corresponding dialogue-level dataset, along with strong Transformer-based baselines.
Ranked #1 on
Recognizing Emotion Cause in Conversations
on RECCON
no code implementations • 11 Dec 2020 • Abhinaba Roy, Deepanway Ghosal, Erik Cambria, Navonil Majumder, Rada Mihalcea, Soujanya Poria
Zero shot learning -- the problem of training and testing on a completely disjoint set of classes -- relies greatly on its ability to transfer knowledge from train classes to test classes.
no code implementations • 19 Nov 2020 • Hui Chen, Deepanway Ghosal, Navonil Majumder, Amir Hussain, Soujanya Poria
Persuasion aims at forming one's opinion and action via a series of persuasive messages containing persuader's strategies.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Deepanway Ghosal, Navonil Majumder, Alexander Gelbukh, Rada Mihalcea, Soujanya Poria
In this paper, we address the task of utterance level emotion recognition in conversations using commonsense knowledge.
Ranked #12 on
Emotion Recognition in Conversation
on DailyDialog
1 code implementation • EMNLP 2020 • Navonil Majumder, Pengfei Hong, Shanshan Peng, Jiankun Lu, Deepanway Ghosal, Alexander Gelbukh, Rada Mihalcea, Soujanya Poria
Current approaches to empathetic response generation view the set of emotions expressed in the input text as a flat structure, where all the emotions are treated uniformly.
2 code implementations • 29 Sep 2020 • Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria
Most of these approaches account for the context for effective understanding.
no code implementations • 26 May 2020 • Deepanway Ghosal, Maheshkumar H. Kolekar
Visual interest & affect prediction is a very interesting area of research in the area of computer vision.
1 code implementation • ACL 2020 • Deepanway Ghosal, Devamanyu Hazarika, Abhinaba Roy, Navonil Majumder, Rada Mihalcea, Soujanya Poria
Cross-domain sentiment analysis has received significant attention in recent years, prompted by the need to combat the domain gap between different applications that make use of sentiment analysis.
2 code implementations • IJCNLP 2019 • Deepanway Ghosal, Navonil Majumder, Soujanya Poria, Niyati Chhaya, Alexander Gelbukh
Emotion recognition in conversation (ERC) has received much attention, lately, from researchers due to its potential widespread applications in diverse areas, such as health-care, education, and human resources.
Ranked #1 on
Emotion Recognition in Conversation
on SEMAINE
Emotion Classification
Emotion Recognition in Conversation
+1
no code implementations • NAACL 2019 • Md. Shad Akhtar, Dushyant Singh Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, Pushpak Bhattacharyya
In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both.
1 code implementation • EMNLP 2018 • Deepanway Ghosal, Md. Shad Akhtar, Dushyant Chauhan, Soujanya Poria, Asif Ekbal, Pushpak Bhattacharyya
We evaluate our proposed approach on two multi-modal sentiment analysis benchmark datasets, viz.
Ranked #7 on
Multimodal Sentiment Analysis
on MOSI
no code implementations • 3 Aug 2018 • Md. Shad Akhtar, Deepanway Ghosal, Asif Ekbal, Pushpak Bhattacharyya, Sadao Kurohashi
In this paper, through multi-task ensemble framework we address three problems of emotion and sentiment analysis i. e. "emotion classification & intensity", "valence, arousal & dominance for emotion" and "valence & arousal} for sentiment".
no code implementations • EMNLP 2017 • Md. Shad Akhtar, Abhishek Kumar, Deepanway Ghosal, Asif Ekbal, Pushpak Bhattacharyya
In this paper, we propose a novel method for combining deep learning and classical feature based models using a Multi-Layer Perceptron (MLP) network for financial sentiment analysis.
no code implementations • SEMEVAL 2017 • Deepanway Ghosal, Shobhit Bhatnagar, Md. Shad Akhtar, Asif Ekbal, Pushpak Bhattacharyya
In this paper we propose an ensemble based model which combines state of the art deep learning sentiment analysis algorithms like Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) along with feature based models to identify optimistic or pessimistic sentiments associated with companies and stocks in financial texts.