1 code implementation • 6 Sep 2024 • Zeyu Zhang, Paul Groth, Iacer Calixto, Sebastian Schelter
Furthermore, our approach exhibits major cost benefits: the average prediction quality of AnyMatch is within 4. 4% of the state-of-the-art method MatchGPT with the proprietary trillion-parameter model GPT-4, yet AnyMatch requires four orders of magnitude less parameters and incurs a 3, 899 times lower inference cost (in dollars per 1, 000 tokens).
1 code implementation • 30 Apr 2024 • Stefan Grafberger, Paul Groth, Sebastian Schelter
Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings.
1 code implementation • 19 Oct 2023 • Olivier Sprangers, Wander Wadman, Sebastian Schelter, Maarten de Rijke
We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level.
1 code implementation • 6 Jul 2023 • Xiaozhong Lyu, Stefan Grafberger, Samantha Biegel, Shaopeng Wei, Meng Cao, Sebastian Schelter, Ce Zhang
There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function.
1 code implementation • 1 May 2023 • Fatemeh Sarvi, Ali Vardasbi, Mohammad Aliannejadi, Sebastian Schelter, Maarten de Rijke
We therefore propose an outlier-aware click model that accounts for both outlier and position bias, called outlier-aware position-based model ( OPBM).
1 code implementation • 23 Apr 2022 • Bojan Karlaš, David Dao, Matteo Interlandi, Bo Li, Sebastian Schelter, Wentao Wu, Ce Zhang
We present DataScope (ease. ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training.
1 code implementation • 27 Jan 2022 • Benjamin Longxiang Wang, Sebastian Schelter
Our results show that our method provides constant update time efficiency with respect to an additional user basket in the incremental case, and linear efficiency in the decremental case where we delete existing baskets.
1 code implementation • 21 Dec 2021 • Fatemeh Sarvi, Maria Heuss, Mohammad Aliannejadi, Sebastian Schelter, Maarten de Rijke
We formalize outlierness in a ranking, show that outliers are present in realistic datasets, and present the results of an eye-tracking study, showing that users scanning order and the exposure of items are influenced by the presence of outliers.
1 code implementation • 6 Dec 2021 • Olivier Sprangers, Sebastian Schelter, Maarten de Rijke
However, these methods require a large number of parameters to be learned, which imposes high memory requirements on the computational resources for training such models.
1 code implementation • 3 Jun 2021 • Olivier Sprangers, Sebastian Schelter, Maarten de Rijke
We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner.
no code implementations • 16 Dec 2020 • Mariya Hendriksen, Ernst Kuiper, Pim Nauts, Sebastian Schelter, Maarten de Rijke
In this paper, we focus on purchase prediction for both anonymous and identified sessions on an e-commerce platform.
1 code implementation • 20 Jul 2020 • Fatemeh Sarvi, Nikos Voskarides, Lois Mooiman, Sebastian Schelter, Maarten de Rijke
As recent learning to match methods have made important advances in bridging the vocabulary gap for these traditional IR areas, we investigate their potential in the context of product search.
no code implementations • 28 Nov 2019 • Sebastian Schelter, Yuxuan He, Jatin Khilnani, Julia Stoyanovich
FairPrep is based on a developer-centered design, and helps data scientists follow best practices in software engineering and machine learning.
no code implementations • 2 Sep 2016 • Nikolaas Steenbergen, Sebastian Schelter, Felix Bießmann
With the rise of big data sets, the popularity of kernel methods declined and neural networks took over again.
no code implementations • 3 Nov 2014 • Sebastian Schelter, Venu Satuluri, Reza Zadeh
We present Factorbird, a prototype of a parameter server approach for factorizing large matrices with Stochastic Gradient Descent-based algorithms.