Search Results for author: Thorsten Joachims

Found 56 papers, 12 papers with code

REBEL: Reinforcement Learning via Regressing Relative Rewards

no code implementations • 25 Apr 2024 • Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models.

Paper
Add Code

Language-Based User Profiles for Recommendation

no code implementations • 23 Feb 2024 • Joyce Zhou, Yijia Dai, Thorsten Joachims

The encoder LLM generates a compact natural-language profile of the user's interests from the user's rating history.

Paper
Add Code

Reviewer2: Optimizing Review Generation Through Prompt Generation

no code implementations • 16 Feb 2024 • Zhaolin Gao, Kianté Brantley, Thorsten Joachims

In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft.

Review Generation

Paper
Add Code

POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition

no code implementations • 9 Feb 2024 • Yuta Saito, Jihan Yao, Thorsten Joachims

We also show that POTEC provides a strict generalization of policy- and regression-based approaches and their associated assumptions.

regression

Paper
Add Code

Unbiased Offline Evaluation for Learning to Rank with Business Rules

no code implementations • 3 Nov 2023 • Matej Jakimov, Alexander Buchholz, Yannik Stein, Thorsten Joachims

For industrial learning-to-rank (LTR) systems, it is common that the output of a ranking model is modified, either as a results of post-processing logic that enforces business requirements, or as a result of unforeseen design flaws or bugs present in real-world production systems.

Learning-To-Rank Off-policy evaluation

Paper
Add Code

Ranking with Slot Constraints

no code implementations • 27 Oct 2023 • Wentao Guo, Andrew Wang, Bradon Thymes, Thorsten Joachims

We introduce the problem of ranking with slot constraints, which can be used to model a wide range of application problems -- from college admission with limited slots for different majors, to composing a stratified cohort of eligible participants in a medical trial.

Paper
Add Code

GPT as a Baseline for Recommendation Explanation Texts

no code implementations • 16 Sep 2023 • Joyce Zhou, Thorsten Joachims

In this work, we establish a baseline potential for how modern model-generated text explanations of movie recommendations may help users, and explore what different components of these text explanations that users like or dislike, especially in contrast to existing human movie reviews.

Paper
Add Code

Fair Ranking under Disparate Uncertainty

no code implementations • 4 Sep 2023 • Richa Rastogi, Thorsten Joachims

While ranking can make human evaluation more effective by focusing attention on the most promising options, we argue that it can introduce unfairness if the uncertainty of the underlying relevance model differs between groups of options.

Fairness

Paper
Add Code

Ranking with Long-Term Constraints

1 code implementation • 10 Jul 2023 • Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims

The feedback that users provide through their choices (e. g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms.

Fairness

Paper
Code

Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters

no code implementations • 30 Jun 2023 • Jinsook Lee, Bradon Thymes, Joyce Zhou, Thorsten Joachims, Rene F. Kizilcec

University admission at many highly selective institutions uses a holistic review process, where all aspects of the application, including protected attributes (e. g., race, gender), grades, essays, and recommendation letters are considered, to compose an excellent and diverse class.

Paper
Add Code

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

no code implementations • 14 May 2023 • Yuta Saito, Qingyang Ren, Thorsten Joachims

To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect.

Off-policy evaluation

Paper
Add Code

Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model

no code implementations • 15 Oct 2022 • Alexander Buchholz, Ben London, Giuseppe Di Benedetto, Thorsten Joachims

A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production.

Learning-To-Rank Off-policy evaluation +2

Paper
Add Code

Boosted Off-Policy Learning

no code implementations • 1 Aug 2022 • Ben London, Levi Lu, Ted Sandler, Thorsten Joachims

We propose the first boosting algorithm for off-policy learning from logged bandit feedback.

Paper
Add Code

Fair Ranking as Fair Division: Impact-Based Individual Fairness in Ranking

1 code implementation • 15 Jun 2022 • Yuta Saito, Thorsten Joachims

Our axioms of envy-freeness and dominance over uniform ranking postulate that for a fair ranking policy every item should prefer their own rank allocation over that of any other item, and that no item should be actively disadvantaged by the rankings.

Fairness

Paper
Code

Uncertainty Quantification for Fairness in Two-Stage Recommender Systems

1 code implementation • 30 May 2022 • Lequn Wang, Thorsten Joachims

To this end, motivated by recent advances in uncertainty quantification, we propose two threshold-policy selection rules that can provide distribution-free and finite-sample guarantees on fairness in first-stage recommenders.

Fairness Recommendation Systems +2

Paper
Code

Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion

no code implementations • 22 Apr 2022 • Adam Block, Rahul Kidambi, Daniel N. Hill, Thorsten Joachims, Inderjit S. Dhillon

A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions.

counterfactual Information Retrieval +2

Paper
Add Code

Off-Policy Evaluation for Large Action Spaces via Embeddings

3 code implementations • 13 Feb 2022 • Yuta Saito, Thorsten Joachims

Unfortunately, when the number of actions is large, existing OPE estimators -- most of which are based on inverse propensity score weighting -- degrade severely and can suffer from extreme bias and variance.

Multi-Armed Bandits Off-policy evaluation +1

614

Paper
Code

Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits

no code implementations • 3 Feb 2022 • Aaron David Tucker, Thorsten Joachims

Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems, since they allow efficient reuse of existing log data.

counterfactual Multi-Armed Bandits +1

Paper
Add Code

Improving Screening Processes via Calibrated Subset Selection

1 code implementation • 2 Feb 2022 • Lequn Wang, Thorsten Joachims, Manuel Gomez Rodriguez

Many selection processes such as finding patients qualifying for a medical trial or retrieval pipelines in search engines consist of multiple stages, where an initial screening stage focuses the resources on shortlisting the most promising candidates.

Retrieval

Paper
Code

Fairness in Ranking under Uncertainty

1 code implementation • NeurIPS 2021 • Ashudeep Singh, David Kempe, Thorsten Joachims

We call an algorithm $\phi$-fair (for $\phi \in [0, 1]$) if it has the following property for all agents $x$ and all $k$: if agent $x$ is among the top $k$ agents with respect to merit with probability at least $\rho$ (according to the posterior merit distribution), then the algorithm places the agent among the top $k$ agents in its ranking with probability at least $\phi \rho$.

Decision Making Fairness

Paper
Code

Optimizing Rankings for Recommendation in Matching Markets

no code implementations • 3 Jun 2021 • Yi Su, Magd Bayoumi, Thorsten Joachims

Based on the success of recommender systems in e-commerce, there is growing interest in their use in matching markets (e. g., labor).

Fairness Recommendation Systems

Paper
Add Code

Fairness of Exposure in Stochastic Bandits

no code implementations • 3 Mar 2021 • Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims

Contextual bandit algorithms have become widely used for recommendation in online systems (e. g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed to the users.

Fairness Multi-Armed Bandits

Paper
Add Code

MOReL: Model-Based Offline Reinforcement Learning

no code implementations • NeurIPS 2020 • Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims

In this work, we present MOReL, an algorithmic framework for model-based offline RL.

Offline RL reinforcement-learning +1

Paper
Add Code

User Fairness, Item Fairness, and Diversity for Rankings in Two-Sided Markets

no code implementations • 4 Oct 2020 • Lequn Wang, Thorsten Joachims

The algorithm optimizes user and item fairness as a convex optimization problem which can be solved optimally.

Fairness

Paper
Add Code

Content-based Music Similarity with Triplet Networks

no code implementations • 11 Aug 2020 • Joseph Cleveland, Derek Cheng, Michael Zhou, Thorsten Joachims, Douglas Turnbull

We compare two models that are trained using different ways of picking this third song: at random vs. based on shared genre labels.

Retrieval

Paper
Add Code

Off-policy Bandits with Deficient Support

1 code implementation • 16 Jun 2020 • Noveen Sachdeva, Yi Su, Thorsten Joachims

Learning effective contextual-bandit policies from past actions of a deployed system is highly desirable in many settings (e. g. voice assistants, recommendation, search), since it enables the reuse of large amounts of log data.

Paper
Code

Controlling Fairness and Bias in Dynamic Learning-to-Rank

1 code implementation • 29 May 2020 • Marco Morik, Ashudeep Singh, Jessica Hong, Thorsten Joachims

Rankings are the primary interface through which many online platforms match users to items (e. g. news, products, music, video).

Fairness Learning-To-Rank

Paper
Code

MOReL : Model-Based Offline Reinforcement Learning

2 code implementations • 12 May 2020 • Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims

In this work, we present MOReL, an algorithmic framework for model-based offline RL.

Offline RL reinforcement-learning +1

Paper
Code

Policy-Gradient Training of Fair and Unbiased Ranking Functions

1 code implementation • 19 Nov 2019 • Himank Yadav, Zhengxiao Du, Thorsten Joachims

is an abundant and attractive source of data for learning to rank, it can produce unfair ranking policies for both exogenous and endogenous reasons.

counterfactual Decision Making +2

Paper
Code

Policy Learning for Fairness in Ranking

1 code implementation • NeurIPS 2019 • Ashudeep Singh, Thorsten Joachims

Conventional Learning-to-Rank (LTR) methods optimize the utility of the rankings to the users, but they are oblivious to their impact on the ranked items.

Fairness Learning-To-Rank

Paper
Code

CAB: Continuous Adaptive Blending Estimator for Policy Evaluation and Learning

no code implementations • 6 Nov 2018 • Yi Su, Lequn Wang, Michele Santacatterina, Thorsten Joachims

In addition, it is sub-differentiable such that it can be used for learning, unlike the SWITCH estimator.

counterfactual Recommendation Systems

Paper
Add Code

Intervention Harvesting for Context-Dependent Examination-Bias Estimation

no code implementations • 5 Nov 2018 • Zhichong Fang, Aman Agarwal, Thorsten Joachims

To overcome this limitation, we propose a Contextual Position-Based Model (CPBM) where the examination bias may also depend on a context vector describing the query and the user.

Learning-To-Rank Position +1

Paper
Add Code

Consistent Position Bias Estimation without Online Interventions for Learning-to-Rank

no code implementations • 9 Jun 2018 • Aman Agarwal, Ivan Zaitsev, Thorsten Joachims

Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal with uninformative signals due to position in the ranking, saliency, and other presentation factors.

counterfactual Learning-To-Rank +1

Paper
Add Code

A General Framework for Counterfactual Learning-to-Rank

no code implementations • 30 Apr 2018 • Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, Thorsten Joachims

Specifically, we derive a relaxation for propensity-weighted rank-based metrics which is subdifferentiable and thus suitable for gradient-based optimization.

counterfactual Counterfactual Inference +1

Paper
Add Code

Fairness of Exposure in Rankings

no code implementations • 20 Feb 2018 • Ashudeep Singh, Thorsten Joachims

Rankings are ubiquitous in the online world today.

Fairness

Paper
Add Code

Deep Learning with Logged Bandit Feedback

no code implementations • ICLR 2018 • Thorsten Joachims, Adith Swaminathan, Maarten de Rijke

We propose a new output layer for deep neural networks that permits the use of logged contextual bandit feedback for training.

counterfactual Object Recognition +1

Paper
Add Code

Effective Evaluation using Logged Bandit Feedback from Multiple Loggers

no code implementations • 17 Mar 2017 • Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims

In this paper, we address the question of how to estimate the performance of a new target policy when we have log data from multiple historic policies.

counterfactual

Paper
Add Code

Large-scale Validation of Counterfactual Learning Methods: A Test-Bed

no code implementations • 1 Dec 2016 • Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, Maarten de Rijke

The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news.

counterfactual Off-policy evaluation +1

Paper
Add Code

Unbiased Learning-to-Rank with Biased Feedback

no code implementations • 16 Aug 2016 • Thorsten Joachims, Adith Swaminathan, Tobias Schnabel

Implicit feedback (e. g., clicks, dwell times, etc.)

counterfactual Counterfactual Inference +2

Paper
Add Code

Unbiased Comparative Evaluation of Ranking Functions

no code implementations • 25 Apr 2016 • Tobias Schnabel, Adith Swaminathan, Peter Frazier, Thorsten Joachims

Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge.

Paper
Add Code

Latent Skill Embedding for Personalized Lesson Sequence Recommendation

no code implementations • 23 Feb 2016 • Siddharth Reddy, Igor Labutov, Thorsten Joachims

In this paper, we present the Latent Skill Embedding (LSE), a probabilistic model of students and educational content that can be used to recommend personalized sequences of lessons with the goal of helping students prepare for specific assessments.

Collaborative Filtering Recommendation Systems

Paper
Add Code

Unbounded Human Learning: Optimal Scheduling for Spaced Repetition

1 code implementation • 23 Feb 2016 • Siddharth Reddy, Igor Labutov, Siddhartha Banerjee, Thorsten Joachims

Second, we use this memory model to develop a stochastic model for spaced repetition systems.

Scheduling

Paper
Code

Recommendations as Treatments: Debiasing Learning and Evaluation

no code implementations • 17 Feb 2016 • Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims

Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself.

Causal Inference Recommendation Systems

Paper
Add Code

Learning Preferences for Manipulation Tasks from Online Coactive Feedback

no code implementations • 5 Jan 2016 • Ashesh Jain, Shikhar Sharma, Thorsten Joachims, Ashutosh Saxena

We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots.

Paper
Add Code

The Self-Normalized Estimator for Counterfactual Learning

no code implementations • NeurIPS 2015 • Adith Swaminathan, Thorsten Joachims

This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem. In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the actions taken by a historical policy. This makes BLBF algorithms particularly attractive for training online systems (e. g., ad placement, web search, recommendation) using their historical logs. The Counterfactual Risk Minimization (CRM) principle offers a general recipe for designing BLBF algorithms.

counterfactual

Paper
Add Code

Using Shortlists to Support Decision Making and Improve Recommender System Performance

no code implementations • 26 Oct 2015 • Tobias Schnabel, Paul N. Bennett, Susan T. Dumais, Thorsten Joachims

From a machine learning perspective, adding items to the shortlist generates a new implicit feedback signal as a by-product of exploration and decision making which can improve recommendation quality.

Decision Making Movie Recommendation +1

Paper
Add Code

Evaluation methods for unsupervised word embeddings

no code implementations • EMNLP 2015 • Tobias Schnabel, Igor Labutov, David Mimno, Thorsten Joachims

Information Retrieval Named Entity Recognition (NER) +2

Paper
Add Code

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

no code implementations • 9 Feb 2015 • Adith Swaminathan, Thorsten Joachims

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback.

counterfactual Multi-Label Classification

Paper
Add Code

Invited Talk: Learning from Rational Behavior

no code implementations • EMNLP 2014 • Thorsten Joachims

Decision Making Learning-To-Rank +2

Paper
Add Code

Reducing Dueling Bandits to Cardinal Bandits

no code implementations • 14 May 2014 • Nir Ailon, Thorsten Joachims, Zohar Karnin

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem.

Multi-Armed Bandits

Paper
Add Code

Methods for Ordinal Peer Grading

no code implementations • 14 Apr 2014 • Karthik Raman, Thorsten Joachims

Thus, in this paper we study the problem of automatically inferring student grades from ordinal peer feedback, as opposed to existing methods that require cardinal peer feedback.

Paper
Add Code

Structured learning of sum-of-submodular higher order energy functions

no code implementations • 28 Sep 2013 • Alexander Fix, Thorsten Joachims, Sam Park, Ramin Zabih

Rather than trying to formulate existing higher order priors as an SoS function, we take a discriminative learning approach, effectively searching the space of SoS functions for a higher order prior that performs well on our training set.

Interactive Segmentation

Paper
Add Code

Learning Trajectory Preferences for Manipulators via Iterative Improvement

no code implementations • NeurIPS 2013 • Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena

In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks.

Paper
Add Code

Large-Margin Learning of Submodular Summarization Models

no code implementations • EACL 2012 • Ruben Sipos, Pannaga Shivaswamy, Thorsten Joachims

Document Summarization Information Retrieval +2

Paper
Add Code

Semantic Labeling of 3D Point Clouds for Indoor Scenes

no code implementations • NeurIPS 2011 • Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84. 06% in labeling 17 object classes for offices, and 73. 38% in labeling 17 object classes for home scenes.

Object

Paper
Add Code

Learning diverse rankings with multi-armed bandits

no code implementations • International Conference on Machine Learning 2008 • Filip Radlinski, Robert Kleinberg, Thorsten Joachims

Algorithms for learning to rank Web documents usually assume a document's relevance is independent of other documents.

Learning-To-Rank Multi-Armed Bandits

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.