no code implementations • 7 Aug 2024 • Beatriz Borges, Negar Foroutan, Deniz Bayazit, Anna Sotnikova, Syrielle Montariol, Tanya Nazaretzky, Mohammadreza Banaei, Alireza Sakhaeirad, Philippe Servant, Seyed Parsa Neshaei, Jibril Frej, Angelika Romanou, Gail Weiss, Sepideh Mamooler, Zeming Chen, Simin Fan, Silin Gao, Mete Ismayilzada, Debjit Paul, Alexandre Schöpfer, Andrej Janchevski, Anja Tiede, Clarence Linden, Emanuele Troiani, Francesco Salvi, Freya Behrens, Giacomo Orsi, Giovanni Piccioli, Hadrien Sevel, Louis Coulon, Manuela Pineros-Rodriguez, Marin Bonnassies, Pierre Hellich, Puck van Gerwen, Sankalp Gambhir, Solal Pirelli, Thomas Blanchard, Timothée Callens, Toni Abi Aoun, Yannick Calvino Alonso, Yuri Cho, Alberto Chiappa, Antonio Sclocchi, Étienne Bruno, Florian Hofhammer, Gabriel Pescia, Geovani Rizk, Leello Dadi, Lucas Stoffl, Manoel Horta Ribeiro, Matthieu Bovel, Yueyang Pan, Aleksandra Radenovic, Alexandre Alahi, Alexander Mathis, Anne-Florence Bitbol, Boi Faltings, Cécile Hébert, Devis Tuia, François Maréchal, George Candea, Giuseppe Carleo, Jean-Cédric Chappelier, Nicolas Flammarion, Jean-Marie Fürbringer, Jean-Philippe Pellet, Karl Aberer, Lenka Zdeborová, Marcel Salathé, Martin Jaggi, Martin Rajman, Mathias Payer, Matthieu Wyart, Michael Gastpar, Michele Ceriotti, Ola Svensson, Olivier Lévêque, Paolo Ienne, Rachid Guerraoui, Robert West, Sanidhya Kashyap, Valerio Piazza, Viesturs Simanis, Viktor Kuncak, Volkan Cevher, Philippe Schwaller, Sacha Friedli, Patrick Jermann, Tanja Kaser, Antoine Bosselut
We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses.
1 code implementation • 7 Aug 2024 • Luca Mouchel, Debjit Paul, Shaobo Cui, Robert West, Antoine Bosselut, Boi Faltings
Despite the remarkable performance of Large Language Models (LLMs), they still struggle with generating logically sound arguments, resulting in potential risks such as spreading misinformation.
no code implementations • 21 Feb 2024 • Debjit Paul, Robert West, Antoine Bosselut, Boi Faltings
In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and find that LLMs do not reliably use their intermediate reasoning steps when generating an answer.
no code implementations • 6 Jan 2024 • Shaobo Cui, Lazar Milikic, Yiyang Feng, Mete Ismayilzada, Debjit Paul, Antoine Bosselut, Boi Faltings
CESAR achieves a significant 69. 7% relative improvement over existing metrics, increasing from 47. 2% to 80. 1% in capturing the causal strength change brought by supporters and defeaters.
1 code implementation • 7 Nov 2023 • Angelika Romanou, Syrielle Montariol, Debjit Paul, Leo Laugier, Karl Aberer, Antoine Bosselut
In this work, we present CRAB, a new Causal Reasoning Assessment Benchmark designed to evaluate causal understanding of events in real-world narratives.
1 code implementation • 23 Oct 2023 • Mete Ismayilzada, Debjit Paul, Syrielle Montariol, Mor Geva, Antoine Bosselut
Recent efforts in natural language processing (NLP) commonsense reasoning research have yielded a considerable number of new datasets and benchmarks.
2 code implementations • 2 Aug 2023 • Martin Josifoski, Lars Klein, Maxime Peyrard, Nicolas Baldwin, Yifei Li, Saibo Geng, Julian Paul Schnitzler, Yuxing Yao, Jiheng Wei, Debjit Paul, Robert West
To support rapid and rigorous research, we introduce the aiFlows library embodying Flows.
1 code implementation • 4 Apr 2023 • Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine Bosselut, Robert West, Boi Faltings
Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e. g., chain-of-thought prompting.
1 code implementation • 13 Oct 2022 • Martin Josifoski, Maxime Peyrard, Frano Rajic, Jiheng Wei, Debjit Paul, Valentin Hartmann, Barun Patra, Vishrav Chaudhary, Emre Kiciman, Boi Faltings, Robert West
Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide empirical evidence supporting the proposed taxonomy and a set of principles to structure reasoning when choosing a decoding algorithm.
1 code implementation • Joint Conference on Lexical and Computational Semantics 2021 • Debjit Paul, Anette Frank
This work offers the first study of how such knowledge impacts the Abductive NLI task -- which consists in choosing the more likely explanation for given observations.
1 code implementation • ACL 2021 • Debjit Paul, Anette Frank
Despite recent successes of large pre-trained language models in solving reasoning tasks, their inference capabilities remain opaque.
1 code implementation • IWCS (ACL) 2021 • Maria Becker, Katharina Korfhage, Debjit Paul, Anette Frank
We conduct evaluations on two argumentative datasets and show that a combination of the two model types generates meaningful, high-quality knowledge paths between sentences that reveal implicit knowledge conveyed in text.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Debjit Paul, Anette Frank
Notably we are, to the best of our knowledge, the first to demonstrate that a model that learns to perform counterfactual reasoning helps predicting the best explanation in an abductive reasoning task.
1 code implementation • NAACL 2019 • Debjit Paul, Anette Frank
To make machines better understand sentiments, research needs to move from polarity identification to understanding the reasons that underlie the expression of sentiment.
no code implementations • NAACL 2019 • Debjit Paul, Mittul Singh, Michael A. Hedderich, Dietrich Klakow
In our experiments on Chunking and NER, this approach performs more robustly than the baselines.