Search Results for author: Sebastian Schelter

Found 15 papers, 11 papers with code

AnyMatch -- Efficient Zero-Shot Entity Matching with a Small Language Model

1 code implementation6 Sep 2024 Zeyu Zhang, Paul Groth, Iacer Calixto, Sebastian Schelter

Furthermore, our approach exhibits major cost benefits: the average prediction quality of AnyMatch is within 4. 4% of the state-of-the-art method MatchGPT with the proprietary trillion-parameter model GPT-4, yet AnyMatch requires four orders of magnitude less parameters and incurs a 3, 899 times lower inference cost (in dollars per 1, 000 tokens).

Attribute AutoML +3

Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"

1 code implementation30 Apr 2024 Stefan Grafberger, Paul Groth, Sebastian Schelter

Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings.

Hierarchical Forecasting at Scale

1 code implementation19 Oct 2023 Olivier Sprangers, Wander Wadman, Sebastian Schelter, Maarten de Rijke

We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level.

Time Series

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

1 code implementation6 Jul 2023 Xiaozhong Lyu, Stefan Grafberger, Samantha Biegel, Shaopeng Wei, Meng Cao, Sebastian Schelter, Ce Zhang

There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function.

Imputation Question Answering +1

On the Impact of Outlier Bias on User Clicks

1 code implementation1 May 2023 Fatemeh Sarvi, Ali Vardasbi, Mohammad Aliannejadi, Sebastian Schelter, Maarten de Rijke

We therefore propose an outlier-aware click model that accounts for both outlier and position bias, called outlier-aware position-based model ( OPBM).

counterfactual Learning-To-Rank +1

Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines

1 code implementation23 Apr 2022 Bojan Karlaš, David Dao, Matteo Interlandi, Bo Li, Sebastian Schelter, Wentao Wu, Ce Zhang

We present DataScope (ease. ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training.

BIG-bench Machine Learning Fairness

Efficiently Maintaining Next Basket Recommendations under Additions and Deletions of Baskets and Items

1 code implementation27 Jan 2022 Benjamin Longxiang Wang, Sebastian Schelter

Our results show that our method provides constant update time efficiency with respect to an additional user basket in the incremental case, and linear efficiency in the decremental case where we delete existing baskets.

Next-basket recommendation Sequential Recommendation

Understanding and Mitigating the Effect of Outliers in Fair Ranking

1 code implementation21 Dec 2021 Fatemeh Sarvi, Maria Heuss, Mohammad Aliannejadi, Sebastian Schelter, Maarten de Rijke

We formalize outlierness in a ranking, show that outliers are present in realistic datasets, and present the results of an eye-tracking study, showing that users scanning order and the exposure of items are influenced by the presence of outliers.

Fairness Outlier Detection +1

Parameter Efficient Deep Probabilistic Forecasting

1 code implementation6 Dec 2021 Olivier Sprangers, Sebastian Schelter, Maarten de Rijke

However, these methods require a large number of parameters to be learned, which imposes high memory requirements on the computational resources for training such models.

Probabilistic Time Series Forecasting Time Series

Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression

1 code implementation3 Jun 2021 Olivier Sprangers, Sebastian Schelter, Maarten de Rijke

We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner.

regression Time Series Analysis

Analyzing and Predicting Purchase Intent in E-commerce: Anonymous vs. Identified Customers

no code implementations16 Dec 2020 Mariya Hendriksen, Ernst Kuiper, Pim Nauts, Sebastian Schelter, Maarten de Rijke

In this paper, we focus on purchase prediction for both anonymous and identified sessions on an e-commerce platform.

Descriptive

A Comparison of Supervised Learning to Match Methods for Product Search

1 code implementation20 Jul 2020 Fatemeh Sarvi, Nikos Voskarides, Lois Mooiman, Sebastian Schelter, Maarten de Rijke

As recent learning to match methods have made important advances in bridging the vocabulary gap for these traditional IR areas, we investigate their potential in the context of product search.

ARC Attribute +3

FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions

no code implementations28 Nov 2019 Sebastian Schelter, Yuxuan He, Jatin Khilnani, Julia Stoyanovich

FairPrep is based on a developer-centered design, and helps data scientists follow best practices in software engineering and machine learning.

BIG-bench Machine Learning Decision Making +2

Doubly stochastic large scale kernel learning with the empirical kernel map

no code implementations2 Sep 2016 Nikolaas Steenbergen, Sebastian Schelter, Felix Bießmann

With the rise of big data sets, the popularity of kernel methods declined and neural networks took over again.

Stochastic Optimization

Factorbird - a Parameter Server Approach to Distributed Matrix Factorization

no code implementations3 Nov 2014 Sebastian Schelter, Venu Satuluri, Reza Zadeh

We present Factorbird, a prototype of a parameter server approach for factorizing large matrices with Stochastic Gradient Descent-based algorithms.

Cannot find the paper you are looking for? You can Submit a new open access paper.