1 code implementation • 20 Jan 2025 • Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini, Leonardo Venuta
Learned sparse text embeddings have gained popularity due to their effectiveness in top-k retrieval and inherent interpretability.
no code implementations • 8 Aug 2024 • Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
At query time, each inverted list associated with a query term is traversed one block at a time in an arbitrary order, with the inner product between the query and summaries determining if a block must be evaluated.
1 code implementation • 20 May 2024 • Sebastian Bruch, Aditya Krishnan, Franco Maria Nardini
Clustering-based nearest neighbor search is an effective method in which points are partitioned into geometric shards to form an index, with only a few shards searched during query processing to find a set of top-$k$ vectors.
1 code implementation • 29 Apr 2024 • Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
In this work, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings.
1 code implementation • 17 Apr 2024 • Thomas Vecchiato, Claudio Lucchese, Franco Maria Nardini, Sebastian Bruch
Its objective is to return a set of $k$ data points that are closest to a query point, with its accuracy measured by the proportion of exact nearest neighbors captured in the returned set.
no code implementations • 17 Jan 2024 • Sebastian Bruch
Vectors are universal mathematical objects that can represent text, images, speech, or a mix of these data modalities.
no code implementations • 16 Sep 2023 • Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top-$k$ retrieval in Information Retrieval.
no code implementations • 15 May 2023 • Sebastian Bruch, Claudio Lucchese, Franco Maria Nardini
We believe that by understanding the fundamentals underpinning these algorithmic and data structure solutions for containing the contentious relationship between efficiency and effectiveness, one can better identify future directions and more efficiently determine the merits of ideas.
no code implementations • 25 Jan 2023 • Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
To achieve optimal memory footprint and query latency, they rely on the near stationarity of documents and on laws governing natural languages.
1 code implementation • 6 Dec 2022 • Mathieu Guillame-Bert, Sebastian Bruch, Richard Stotz, Jan Pfeifer
Yggdrasil Decision Forests is a library for the training, serving and interpretation of decision forest models, targeted both at research and production work, implemented in C++, and available in C++, command line interface, Python (under the name TensorFlow Decision Forests), JavaScript, Go, and Google Sheets (under the name Simple ML for Sheets).
no code implementations • 21 Oct 2022 • Sebastian Bruch, Siyu Gai, Amir Ingber
In particular, we examine fusion by a convex combination (CC) of lexical and semantic scores, as well as the Reciprocal Rank Fusion (RRF) method, and identify their advantages and potential pitfalls.
no code implementations • 21 Sep 2020 • Mathieu Guillame-Bert, Sebastian Bruch, Petr Mitrichev, Petr Mikheev, Jan Pfeifer
We define a condition that is specific to categorical-set features -- defined as an unordered set of categorical variables -- and present an algorithm to learn it, thereby equipping decision forests with the ability to directly model text, albeit without preserving sequential order.
no code implementations • 29 Jul 2020 • Sebastian Bruch, Jan Pfeifer, Mathieu Guillame-Bert
Axis-aligned decision forests have long been the leading class of machine learning algorithms for modeling tabular data.
no code implementations • 22 Nov 2019 • Sebastian Bruch
Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval.
2 code implementations • 30 Nov 2018 • Rama Kumar Pasumarthi, Sebastian Bruch, Xuanhui Wang, Cheng Li, Michael Bendersky, Marc Najork, Jan Pfeifer, Nadav Golbandi, Rohan Anil, Stephan Wolf
We propose TensorFlow Ranking, the first open source library for solving large-scale ranking problems in a deep learning framework.
2 code implementations • 11 Nov 2018 • Qingyao Ai, Xuanhui Wang, Sebastian Bruch, Nadav Golbandi, Michael Bendersky, Marc Najork
To overcome this limitation, we propose a new framework for multivariate scoring functions, in which the relevance score of a document is determined jointly by multiple documents in the list.