Search Results for author: Debjit Paul

Found 15 papers, 11 papers with code

Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

no code implementations7 Aug 2024 Beatriz Borges, Negar Foroutan, Deniz Bayazit, Anna Sotnikova, Syrielle Montariol, Tanya Nazaretzky, Mohammadreza Banaei, Alireza Sakhaeirad, Philippe Servant, Seyed Parsa Neshaei, Jibril Frej, Angelika Romanou, Gail Weiss, Sepideh Mamooler, Zeming Chen, Simin Fan, Silin Gao, Mete Ismayilzada, Debjit Paul, Alexandre Schöpfer, Andrej Janchevski, Anja Tiede, Clarence Linden, Emanuele Troiani, Francesco Salvi, Freya Behrens, Giacomo Orsi, Giovanni Piccioli, Hadrien Sevel, Louis Coulon, Manuela Pineros-Rodriguez, Marin Bonnassies, Pierre Hellich, Puck van Gerwen, Sankalp Gambhir, Solal Pirelli, Thomas Blanchard, Timothée Callens, Toni Abi Aoun, Yannick Calvino Alonso, Yuri Cho, Alberto Chiappa, Antonio Sclocchi, Étienne Bruno, Florian Hofhammer, Gabriel Pescia, Geovani Rizk, Leello Dadi, Lucas Stoffl, Manoel Horta Ribeiro, Matthieu Bovel, Yueyang Pan, Aleksandra Radenovic, Alexandre Alahi, Alexander Mathis, Anne-Florence Bitbol, Boi Faltings, Cécile Hébert, Devis Tuia, François Maréchal, George Candea, Giuseppe Carleo, Jean-Cédric Chappelier, Nicolas Flammarion, Jean-Marie Fürbringer, Jean-Philippe Pellet, Karl Aberer, Lenka Zdeborová, Marcel Salathé, Martin Jaggi, Martin Rajman, Mathias Payer, Matthieu Wyart, Michael Gastpar, Michele Ceriotti, Ola Svensson, Olivier Lévêque, Paolo Ienne, Rachid Guerraoui, Robert West, Sanidhya Kashyap, Valerio Piazza, Viesturs Simanis, Viktor Kuncak, Volkan Cevher, Philippe Schwaller, Sacha Friedli, Patrick Jermann, Tanja Kaser, Antoine Bosselut

We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses.

A Logical Fallacy-Informed Framework for Argument Generation

1 code implementation7 Aug 2024 Luca Mouchel, Debjit Paul, Shaobo Cui, Robert West, Antoine Bosselut, Boi Faltings

Despite the remarkable performance of Large Language Models (LLMs), they still struggle with generating logically sound arguments, resulting in potential risks such as spreading misinformation.

Logical Fallacies Misinformation

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning

no code implementations21 Feb 2024 Debjit Paul, Robert West, Antoine Bosselut, Boi Faltings

In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and find that LLMs do not reliably use their intermediate reasoning steps when generating an answer.

counterfactual

Exploring Defeasibility in Causal Reasoning

no code implementations6 Jan 2024 Shaobo Cui, Lazar Milikic, Yiyang Feng, Mete Ismayilzada, Debjit Paul, Antoine Bosselut, Boi Faltings

CESAR achieves a significant 69. 7% relative improvement over existing metrics, increasing from 47. 2% to 80. 1% in capturing the causal strength change brought by supporters and defeaters.

CRAB: Assessing the Strength of Causal Relationships Between Real-world Events

1 code implementation7 Nov 2023 Angelika Romanou, Syrielle Montariol, Debjit Paul, Leo Laugier, Karl Aberer, Antoine Bosselut

In this work, we present CRAB, a new Causal Reasoning Assessment Benchmark designed to evaluate causal understanding of events in real-world narratives.

CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks

1 code implementation23 Oct 2023 Mete Ismayilzada, Debjit Paul, Syrielle Montariol, Mor Geva, Antoine Bosselut

Recent efforts in natural language processing (NLP) commonsense reasoning research have yielded a considerable number of new datasets and benchmarks.

Benchmarking

REFINER: Reasoning Feedback on Intermediate Representations

1 code implementation4 Apr 2023 Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine Bosselut, Robert West, Boi Faltings

Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e. g., chain-of-thought prompting.

Language Model Decoding as Likelihood-Utility Alignment

1 code implementation13 Oct 2022 Martin Josifoski, Maxime Peyrard, Frano Rajic, Jiheng Wei, Debjit Paul, Valentin Hartmann, Barun Patra, Vishrav Chaudhary, Emre Kiciman, Boi Faltings, Robert West

Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide empirical evidence supporting the proposed taxonomy and a set of principles to structure reasoning when choosing a decoding algorithm.

Language Modelling Text Generation

Generating Hypothetical Events for Abductive Inference

1 code implementation Joint Conference on Lexical and Computational Semantics 2021 Debjit Paul, Anette Frank

This work offers the first study of how such knowledge impacts the Abductive NLI task -- which consists in choosing the more likely explanation for given observations.

Language Modelling

COINS: Dynamically Generating COntextualized Inference Rules for Narrative Story Completion

1 code implementation ACL 2021 Debjit Paul, Anette Frank

Despite recent successes of large pre-trained language models in solving reasoning tasks, their inference capabilities remain opaque.

Sentence Story Completion

CO-NNECT: A Framework for Revealing Commonsense Knowledge Paths as Explicitations of Implicit Knowledge in Texts

1 code implementation IWCS (ACL) 2021 Maria Becker, Katharina Korfhage, Debjit Paul, Anette Frank

We conduct evaluations on two argumentative datasets and show that a combination of the two model types generates meaningful, high-quality knowledge paths between sentences that reveal implicit knowledge conveyed in text.

Relation

Social Commonsense Reasoning with Multi-Head Knowledge Attention

1 code implementation Findings of the Association for Computational Linguistics 2020 Debjit Paul, Anette Frank

Notably we are, to the best of our knowledge, the first to demonstrate that a model that learns to perform counterfactual reasoning helps predicting the best explanation in an abductive reasoning task.

counterfactual Counterfactual Reasoning +1

Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs

1 code implementation NAACL 2019 Debjit Paul, Anette Frank

To make machines better understand sentiments, research needs to move from polarity identification to understanding the reasons that underlie the expression of sentiment.

Common Sense Reasoning

Cannot find the paper you are looking for? You can Submit a new open access paper.