1 code implementation • 24 Feb 2025 • Rohit Saxena, Pasquale Minervini, Frank Keller
We benchmark state-of-the-art Multimodal Large Language Models (MLLMs) on PosterSum and demonstrate that they struggle to accurately interpret and summarize scientific posters.
1 code implementation • 12 Feb 2025 • Dongqi Liu, Chenxi Whitehouse, Xi Yu, Louis Mahon, Rohit Saxena, Zheng Zhao, Yifu Qiu, Mirella Lapata, Vera Demberg
Transforming recorded videos into concise and accurate textual summaries is a growing challenge in multimodal learning.
no code implementations • 7 Feb 2025 • Rohit Saxena, Aryo Pradipta Gema, Pasquale Minervini
Understanding time from visual representations is a fundamental cognitive skill, yet it remains a challenge for multimodal large language models (MLLMs).
no code implementations • 3 Jan 2025 • Rohit Saxena, Hao Tang, Frank Keller
Training transformer-based encoder-decoder models for long document summarization poses a significant challenge due to the quadratic memory consumption during training.
1 code implementation • 12 Aug 2024 • Rohit Saxena, Frank Keller
Movie screenplay summarization is challenging, as it requires an understanding of long input contexts and various elements unique to movies.
3 code implementations • 6 Jun 2024 • Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini
For example, we find that 57% of the analysed questions in the Virology subset contain errors.
no code implementations • 8 Apr 2024 • Giwon Hong, Aryo Pradipta Gema, Rohit Saxena, Xiaotang Du, Ping Nie, Yu Zhao, Laura Perez-Beltrachini, Max Ryabinin, Xuanli He, Clémentine Fourrier, Pasquale Minervini
Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text.
1 code implementation • 4 Apr 2024 • Rohit Saxena, Frank Keller
Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current language models.
no code implementations • 28 Nov 2019 • Ramit Pahwa, Manoj Ghuhan Arivazhagan, Ankur Garg, Siddarth Krishnamoorthy, Rohit Saxena, Sunav Choudhary
Designing and training a CNN architecture that does well on all three metrics is highly non-trivial and can be very time-consuming if done by hand.
no code implementations • WS 2018 • Rohit Saxena, Savita Bhat, Niranjan Pedanekar
It is an emotion detection task on dialogues in the EmotionLines dataset.