Search Results for author: Mark Niklas Müller

Found 24 papers, 13 papers with code

Automated Benchmark Generation for Repository-Level Coding Tasks

no code implementations10 Mar 2025 Konstantinos Vergopoulos, Mark Niklas Müller, Martin Vechev

The correctness of generated patches is then evaluated by executing a human-written test suite extracted from the repository after the issue's resolution.

Average Certified Radius is a Poor Metric for Randomized Smoothing

no code implementations9 Oct 2024 Chenhao Sun, Yuhao Mao, Mark Niklas Müller, Martin Vechev

Randomized smoothing is a popular approach for providing certified robustness guarantees against adversarial attacks, and has become an active area of research.

Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

no code implementations11 Jul 2024 Anton Alexandrov, Veselin Raychev, Mark Niklas Müller, Ce Zhang, Martin Vechev, Kristina Toutanova

As open-weight large language models (LLMs) achieve ever more impressive performances across a wide range of tasks in English, practitioners aim to adapt these models to different languages.

SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents

1 code implementation18 Jun 2024 Niels Mündler, Mark Niklas Müller, Jingxuan He, Martin Vechev

We find that LLMs generally perform surprisingly well at generating relevant test cases, with Code Agents designed for code repair exceeding the performance of systems designed specifically for test generation.

Code Generation Code Repair +1

ConStat: Performance-Based Contamination Detection in Large Language Models

no code implementations25 May 2024 Jasper Dekoninck, Mark Niklas Müller, Martin Vechev

To overcome these limitations, we propose a novel definition of contamination as artificially inflated and non-generalizing benchmark performance instead of the inclusion of benchmark samples in the training data.

DAGER: Exact Gradient Inversion for Large Language Models

1 code implementation24 May 2024 Ivo Petrov, Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, Martin Vechev

Federated learning works by aggregating locally computed gradients from multiple clients, thus enabling collaborative training without sharing private client data.

Decoder Federated Learning

SPEAR:Exact Gradient Inversion of Batches in Federated Learning

no code implementations6 Mar 2024 Dimitar I. Dimitrov, Maximilian Baader, Mark Niklas Müller, Martin Vechev

In this work, we propose SPEAR, the first algorithm reconstructing whole batches with $b >1$ exactly.

Federated Learning

Evading Data Contamination Detection for Language Models is (too) Easy

2 code implementations5 Feb 2024 Jasper Dekoninck, Mark Niklas Müller, Maximilian Baader, Marc Fischer, Martin Vechev

Large language models are widespread, with their performance on benchmarks frequently guiding user preferences for one model over another.

Automated Classification of Model Errors on ImageNet

1 code implementation NeurIPS 2023 Momchil Peychev, Mark Niklas Müller, Marc Fischer, Martin Vechev

To address this, new label-sets and evaluation protocols have been proposed for ImageNet showing that state-of-the-art models already achieve over 95% accuracy and shifting the focus on investigating why the remaining errors persist.

Classification model

Prompt Sketching for Large Language Models

no code implementations8 Nov 2023 Luca Beurer-Kellner, Mark Niklas Müller, Marc Fischer, Martin Vechev

This way, sketching grants users more control over the generation process, e. g., by providing a reasoning framework via intermediate instructions, leading to better overall results.

Arithmetic Reasoning Benchmarking +3

Expressivity of ReLU-Networks under Convex Relaxations

no code implementations7 Nov 2023 Maximilian Baader, Mark Niklas Müller, Yuhao Mao, Martin Vechev

We show that: (i) more advanced relaxations allow a larger class of univariate functions to be expressed as precisely analyzable ReLU networks, (ii) more precise relaxations can allow exponentially larger solution spaces of ReLU networks encoding the same functions, and (iii) even using the most precise single-neuron relaxations, it is impossible to construct precisely analyzable ReLU networks that express multivariate, convex, monotone CPWL functions.

Understanding Certified Training with Interval Bound Propagation

1 code implementation17 Jun 2023 Yuhao Mao, Mark Niklas Müller, Marc Fischer, Martin Vechev

We, then, derive sufficient and necessary conditions on weight matrices for IBP bounds to become exact and demonstrate that these impose strong regularization, explaining the empirically observed trade-off between robustness and accuracy in certified training.

TAPS: Connecting Certified and Adversarial Training

2 code implementations8 May 2023 Yuhao Mao, Mark Niklas Müller, Marc Fischer, Martin Vechev

Training certifiably robust neural networks remains a notoriously hard problem.

Efficient Certified Training and Robustness Verification of Neural ODEs

1 code implementation9 Mar 2023 Mustafa Zeqiri, Mark Niklas Müller, Marc Fischer, Martin Vechev

Neural Ordinary Differential Equations (NODEs) are a novel neural architecture, built around initial value problems with learned dynamics which are solved during inference.

Time Series Time Series Forecasting

First Three Years of the International Verification of Neural Networks Competition (VNN-COMP)

no code implementations14 Jan 2023 Christopher Brix, Mark Niklas Müller, Stanley Bak, Taylor T. Johnson, Changliu Liu

This paper presents a summary and meta-analysis of the first three iterations of the annual International Verification of Neural Networks Competition (VNN-COMP) held in 2020, 2021, and 2022.

Image Classification reinforcement-learning +1

The Third International Verification of Neural Networks Competition (VNN-COMP 2022): Summary and Results

1 code implementation20 Dec 2022 Mark Niklas Müller, Christopher Brix, Stanley Bak, Changliu Liu, Taylor T. Johnson

This report summarizes the 3rd International Verification of Neural Networks Competition (VNN-COMP 2022), held as a part of the 5th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), which was collocated with the 34th International Conference on Computer-Aided Verification (CAV).

Certified Training: Small Boxes are All You Need

1 code implementation10 Oct 2022 Mark Niklas Müller, Franziska Eckert, Marc Fischer, Martin Vechev

To obtain, deterministic guarantees of adversarial robustness, specialized training methods are used.

Adversarial Robustness All

(De-)Randomized Smoothing for Decision Stump Ensembles

1 code implementation27 May 2022 Miklós Z. Horváth, Mark Niklas Müller, Marc Fischer, Martin Vechev

Whereas most prior work on randomized smoothing focuses on evaluating arbitrary base models approximately under input randomization, the key insight of our work is that decision stump ensembles enable exact yet efficient evaluation via dynamic programming.

Robust and Accurate -- Compositional Architectures for Randomized Smoothing

1 code implementation1 Apr 2022 Miklós Z. Horváth, Mark Niklas Müller, Marc Fischer, Martin Vechev

Randomized Smoothing (RS) is considered the state-of-the-art approach to obtain certifiably robust models for challenging tasks.

Abstract Interpretation of Fixpoint Iterators with Applications to Neural Networks

1 code implementation14 Oct 2021 Mark Niklas Müller, Marc Fischer, Robin Staab, Martin Vechev

We present a new abstract interpretation framework for the precise over-approximation of numerical fixpoint iterators.

Boosting Randomized Smoothing with Variance Reduced Classifiers

1 code implementation ICLR 2022 Miklós Z. Horváth, Mark Niklas Müller, Marc Fischer, Martin Vechev

Randomized Smoothing (RS) is a promising method for obtaining robustness certificates by evaluating a base model under noise.

Cannot find the paper you are looking for? You can Submit a new open access paper.