Search Results for author: Spyridon Mouselinos

Found 5 papers, 0 papers with code

Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models

no code implementations6 Feb 2024 Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski

Large Language Models (LLMs) demonstrate ever-increasing abilities in mathematical and algorithmic tasks, yet their geometric reasoning skills are underexplored.

Mathematical Reasoning Variable Selection

A Simple, Yet Effective Approach to Finding Biases in Code Generation

no code implementations31 Oct 2022 Spyridon Mouselinos, Mateusz Malinowski, Henryk Michalewski

This work shows that current code generation systems exhibit undesired biases inherited from their large language model backbones, which can reduce the quality of the generated code under specific circumstances.

Causal Language Modeling Code Generation +2

Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

no code implementations24 Feb 2022 Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski

Visual question answering provides a convenient framework for testing the model's abilities by interrogating the model through questions about the scene.

Benchmarking Question Answering +2

Measuring CLEVRness: Black-box Testing of Visual Reasoning Models

no code implementations ICLR 2022 Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski

To answer such a question, we extend the visual question answering framework and propose the following behavioral test in the form of a two-player game.

Benchmarking Question Answering +2

MAIN: Multihead-Attention Imputation Networks

no code implementations10 Feb 2021 Spyridon Mouselinos, Kyriakos Polymenakos, Antonis Nikitakis, Konstantinos Kyriakopoulos

The problem of missing data, usually absent incurated and competition-standard datasets, is an unfortunate reality for most machine learning models used in industry applications.

Imputation

Cannot find the paper you are looking for? You can Submit a new open access paper.