no code implementations • 10 Mar 2025 • Konstantinos Vergopoulos, Mark Niklas Müller, Martin Vechev
The correctness of generated patches is then evaluated by executing a human-written test suite extracted from the repository after the issue's resolution.
no code implementations • 9 Oct 2024 • Chenhao Sun, Yuhao Mao, Mark Niklas Müller, Martin Vechev
Randomized smoothing is a popular approach for providing certified robustness guarantees against adversarial attacks, and has become an active area of research.
no code implementations • 6 Oct 2024 • Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, Mark Niklas Müller
Here, we extend this work to the challenging and largely unexplored domain of robotics systems.
no code implementations • 11 Jul 2024 • Anton Alexandrov, Veselin Raychev, Mark Niklas Müller, Ce Zhang, Martin Vechev, Kristina Toutanova
As open-weight large language models (LLMs) achieve ever more impressive performances across a wide range of tasks in English, practitioners aim to adapt these models to different languages.
1 code implementation • 18 Jun 2024 • Niels Mündler, Mark Niklas Müller, Jingxuan He, Martin Vechev
We find that LLMs generally perform surprisingly well at generating relevant test cases, with Code Agents designed for code repair exceeding the performance of systems designed specifically for test generation.
no code implementations • 25 May 2024 • Jasper Dekoninck, Mark Niklas Müller, Martin Vechev
To overcome these limitations, we propose a novel definition of contamination as artificially inflated and non-generalizing benchmark performance instead of the inclusion of benchmark samples in the training data.
1 code implementation • 24 May 2024 • Ivo Petrov, Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, Martin Vechev
Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data.
no code implementations • 11 Mar 2024 • Stefan Balauca, Mark Niklas Müller, Yuhao Mao, Maximilian Baader, Marc Fischer, Martin Vechev
In this work, we theoretically show that Gaussian Loss Smoothing (GLS) can alleviate these issues.
no code implementations • 6 Mar 2024 • Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, Martin Vechev
In this work, we propose SPEAR, the first algorithm reconstructing whole batches with $b >1$ exactly.
2 code implementations • 5 Feb 2024 • Jasper Dekoninck, Mark Niklas Müller, Maximilian Baader, Marc Fischer, Martin Vechev
Large language models are widespread, with their performance on benchmarks frequently guiding user preferences for one model over another.
1 code implementation • NeurIPS 2023 • Momchil Peychev, Mark Niklas Müller, Marc Fischer, Martin Vechev
To address this, new label-sets and evaluation protocols have been proposed for ImageNet showing that state-of-the-art models already achieve over 95% accuracy and shifting the focus on investigating why the remaining errors persist.
no code implementations • 8 Nov 2023 • Luca Beurer-Kellner, Mark Niklas Müller, Marc Fischer, Martin Vechev
This way, sketching grants users more control over the generation process, e. g., by providing a reasoning framework via intermediate instructions, leading to better overall results.
no code implementations • 7 Nov 2023 • Maximilian Baader, Mark Niklas Müller, Yuhao Mao, Martin Vechev
We show that: (i) more advanced relaxations allow a larger class of univariate functions to be expressed as precisely analyzable ReLU networks, (ii) more precise relaxations can allow exponentially larger solution spaces of ReLU networks encoding the same functions, and (iii) even using the most precise single-neuron relaxations, it is impossible to construct precisely analyzable ReLU networks that express multivariate, convex, monotone CPWL functions.
1 code implementation • 17 Jun 2023 • Yuhao Mao, Mark Niklas Müller, Marc Fischer, Martin Vechev
We, then, derive sufficient and necessary conditions on weight matrices for IBP bounds to become exact and demonstrate that these impose strong regularization, explaining the empirically observed trade-off between robustness and accuracy in certified training.
2 code implementations • 8 May 2023 • Yuhao Mao, Mark Niklas Müller, Marc Fischer, Martin Vechev
Training certifiably robust neural networks remains a notoriously hard problem.
1 code implementation • 9 Mar 2023 • Mustafa Zeqiri, Mark Niklas Müller, Marc Fischer, Martin Vechev
Neural Ordinary Differential Equations (NODEs) are a novel neural architecture, built around initial value problems with learned dynamics which are solved during inference.
no code implementations • 14 Jan 2023 • Christopher Brix, Mark Niklas Müller, Stanley Bak, Taylor T. Johnson, Changliu Liu
This paper presents a summary and meta-analysis of the first three iterations of the annual International Verification of Neural Networks Competition (VNN-COMP) held in 2020, 2021, and 2022.
1 code implementation • 20 Dec 2022 • Mark Niklas Müller, Christopher Brix, Stanley Bak, Changliu Liu, Taylor T. Johnson
This report summarizes the 3rd International Verification of Neural Networks Competition (VNN-COMP 2022), held as a part of the 5th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), which was collocated with the 34th International Conference on Computer-Aided Verification (CAV).
1 code implementation • 10 Oct 2022 • Mark Niklas Müller, Franziska Eckert, Marc Fischer, Martin Vechev
To obtain, deterministic guarantees of adversarial robustness, specialized training methods are used.
1 code implementation • 27 May 2022 • Miklós Z. Horváth, Mark Niklas Müller, Marc Fischer, Martin Vechev
Whereas most prior work on randomized smoothing focuses on evaluating arbitrary base models approximately under input randomization, the key insight of our work is that decision stump ensembles enable exact yet efficient evaluation via dynamic programming.
1 code implementation • 1 Apr 2022 • Miklós Z. Horváth, Mark Niklas Müller, Marc Fischer, Martin Vechev
Randomized Smoothing (RS) is considered the state-of-the-art approach to obtain certifiably robust models for challenging tasks.
1 code implementation • 14 Oct 2021 • Mark Niklas Müller, Marc Fischer, Robin Staab, Martin Vechev
We present a new abstract interpretation framework for the precise over-approximation of numerical fixpoint iterators.
1 code implementation • ICLR 2022 • Miklós Z. Horváth, Mark Niklas Müller, Marc Fischer, Martin Vechev
Randomized Smoothing (RS) is a promising method for obtaining robustness certificates by evaluating a base model under noise.
no code implementations • 5 Mar 2021 • Mark Niklas Müller, Gleb Makarchuk, Gagandeep Singh, Markus Püschel, Martin Vechev
Formal verification of neural networks is critical for their safe adoption in real-world applications.