Search Results for author: Thorsten Joachims

Found 56 papers, 15 papers with code

REBEL: Reinforcement Learning via Regressing Relative Rewards

2 code implementations25 Apr 2024 Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models.

Continuous Control Image Generation +3

Language-Based User Profiles for Recommendation

no code implementations23 Feb 2024 Joyce Zhou, Yijia Dai, Thorsten Joachims

The encoder LLM generates a compact natural-language profile of the user's interests from the user's rating history.

Decoder

Reviewer2: Optimizing Review Generation Through Prompt Generation

1 code implementation16 Feb 2024 Zhaolin Gao, Kianté Brantley, Thorsten Joachims

In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft.

Review Generation

POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition

no code implementations9 Feb 2024 Yuta Saito, Jihan Yao, Thorsten Joachims

We also show that POTEC provides a strict generalization of policy- and regression-based approaches and their associated assumptions.

regression

Unbiased Offline Evaluation for Learning to Rank with Business Rules

no code implementations3 Nov 2023 Matej Jakimov, Alexander Buchholz, Yannik Stein, Thorsten Joachims

For industrial learning-to-rank (LTR) systems, it is common that the output of a ranking model is modified, either as a results of post-processing logic that enforces business requirements, or as a result of unforeseen design flaws or bugs present in real-world production systems.

Learning-To-Rank Off-policy evaluation

Ranking with Slot Constraints

no code implementations27 Oct 2023 Wentao Guo, Andrew Wang, Bradon Thymes, Thorsten Joachims

We introduce the problem of ranking with slot constraints, which can be used to model a wide range of application problems -- from college admission with limited slots for different majors, to composing a stratified cohort of eligible participants in a medical trial.

GPT as a Baseline for Recommendation Explanation Texts

no code implementations16 Sep 2023 Joyce Zhou, Thorsten Joachims

In this work, we establish a baseline potential for how modern model-generated text explanations of movie recommendations may help users, and explore what different components of these text explanations that users like or dislike, especially in contrast to existing human movie reviews.

Fairness in Ranking under Disparate Uncertainty

1 code implementation4 Sep 2023 Richa Rastogi, Thorsten Joachims

While ranking can make human evaluation more effective by focusing attention on the most promising options, we argue that it can introduce unfairness if the uncertainty of the underlying relevance model differs between groups of options.

Fairness

Ranking with Long-Term Constraints

1 code implementation10 Jul 2023 Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims

The feedback that users provide through their choices (e. g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms.

Fairness

Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters

no code implementations30 Jun 2023 Jinsook Lee, Bradon Thymes, Joyce Zhou, Thorsten Joachims, Rene F. Kizilcec

University admission at many highly selective institutions uses a holistic review process, where all aspects of the application, including protected attributes (e. g., race, gender), grades, essays, and recommendation letters are considered, to compose an excellent and diverse class.

Diversity

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

no code implementations14 May 2023 Yuta Saito, Qingyang Ren, Thorsten Joachims

To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect.

Off-policy evaluation

Boosted Off-Policy Learning

no code implementations1 Aug 2022 Ben London, Levi Lu, Ted Sandler, Thorsten Joachims

We propose the first boosting algorithm for off-policy learning from logged bandit feedback.

Fair Ranking as Fair Division: Impact-Based Individual Fairness in Ranking

1 code implementation15 Jun 2022 Yuta Saito, Thorsten Joachims

Our axioms of envy-freeness and dominance over uniform ranking postulate that for a fair ranking policy every item should prefer their own rank allocation over that of any other item, and that no item should be actively disadvantaged by the rankings.

Fairness

Uncertainty Quantification for Fairness in Two-Stage Recommender Systems

1 code implementation30 May 2022 Lequn Wang, Thorsten Joachims

To this end, motivated by recent advances in uncertainty quantification, we propose two threshold-policy selection rules that can provide distribution-free and finite-sample guarantees on fairness in first-stage recommenders.

Fairness Recommendation Systems +2

Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion

no code implementations22 Apr 2022 Adam Block, Rahul Kidambi, Daniel N. Hill, Thorsten Joachims, Inderjit S. Dhillon

A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions.

counterfactual Information Retrieval +2

Off-Policy Evaluation for Large Action Spaces via Embeddings

3 code implementations13 Feb 2022 Yuta Saito, Thorsten Joachims

Unfortunately, when the number of actions is large, existing OPE estimators -- most of which are based on inverse propensity score weighting -- degrade severely and can suffer from extreme bias and variance.

Multi-Armed Bandits Off-policy evaluation +1

Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits

no code implementations3 Feb 2022 Aaron David Tucker, Thorsten Joachims

Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems, since they allow efficient reuse of existing log data.

counterfactual Multi-Armed Bandits +1

Improving Screening Processes via Calibrated Subset Selection

1 code implementation2 Feb 2022 Lequn Wang, Thorsten Joachims, Manuel Gomez Rodriguez

Many selection processes such as finding patients qualifying for a medical trial or retrieval pipelines in search engines consist of multiple stages, where an initial screening stage focuses the resources on shortlisting the most promising candidates.

Diversity Retrieval

Fairness in Ranking under Uncertainty

1 code implementation NeurIPS 2021 Ashudeep Singh, David Kempe, Thorsten Joachims

We call an algorithm $\phi$-fair (for $\phi \in [0, 1]$) if it has the following property for all agents $x$ and all $k$: if agent $x$ is among the top $k$ agents with respect to merit with probability at least $\rho$ (according to the posterior merit distribution), then the algorithm places the agent among the top $k$ agents in its ranking with probability at least $\phi \rho$.

Decision Making Fairness

Optimizing Rankings for Recommendation in Matching Markets

no code implementations3 Jun 2021 Yi Su, Magd Bayoumi, Thorsten Joachims

Based on the success of recommender systems in e-commerce, there is growing interest in their use in matching markets (e. g., labor).

Fairness Recommendation Systems

Fairness of Exposure in Stochastic Bandits

no code implementations3 Mar 2021 Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims

Contextual bandit algorithms have become widely used for recommendation in online systems (e. g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed to the users.

Fairness Multi-Armed Bandits

User Fairness, Item Fairness, and Diversity for Rankings in Two-Sided Markets

no code implementations4 Oct 2020 Lequn Wang, Thorsten Joachims

The algorithm optimizes user and item fairness as a convex optimization problem which can be solved optimally.

Diversity Fairness

Content-based Music Similarity with Triplet Networks

no code implementations11 Aug 2020 Joseph Cleveland, Derek Cheng, Michael Zhou, Thorsten Joachims, Douglas Turnbull

We compare two models that are trained using different ways of picking this third song: at random vs. based on shared genre labels.

Retrieval

Off-policy Bandits with Deficient Support

1 code implementation16 Jun 2020 Noveen Sachdeva, Yi Su, Thorsten Joachims

Learning effective contextual-bandit policies from past actions of a deployed system is highly desirable in many settings (e. g. voice assistants, recommendation, search), since it enables the reuse of large amounts of log data.

Controlling Fairness and Bias in Dynamic Learning-to-Rank

1 code implementation29 May 2020 Marco Morik, Ashudeep Singh, Jessica Hong, Thorsten Joachims

Rankings are the primary interface through which many online platforms match users to items (e. g. news, products, music, video).

Fairness Learning-To-Rank

Policy-Gradient Training of Fair and Unbiased Ranking Functions

1 code implementation19 Nov 2019 Himank Yadav, Zhengxiao Du, Thorsten Joachims

is an abundant and attractive source of data for learning to rank, it can produce unfair ranking policies for both exogenous and endogenous reasons.

counterfactual Decision Making +2

Policy Learning for Fairness in Ranking

1 code implementation NeurIPS 2019 Ashudeep Singh, Thorsten Joachims

Conventional Learning-to-Rank (LTR) methods optimize the utility of the rankings to the users, but they are oblivious to their impact on the ranked items.

Fairness Learning-To-Rank

Intervention Harvesting for Context-Dependent Examination-Bias Estimation

no code implementations5 Nov 2018 Zhichong Fang, Aman Agarwal, Thorsten Joachims

To overcome this limitation, we propose a Contextual Position-Based Model (CPBM) where the examination bias may also depend on a context vector describing the query and the user.

Learning-To-Rank Position +1

Consistent Position Bias Estimation without Online Interventions for Learning-to-Rank

no code implementations9 Jun 2018 Aman Agarwal, Ivan Zaitsev, Thorsten Joachims

Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal with uninformative signals due to position in the ranking, saliency, and other presentation factors.

counterfactual Learning-To-Rank +1

A General Framework for Counterfactual Learning-to-Rank

no code implementations30 Apr 2018 Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, Thorsten Joachims

Specifically, we derive a relaxation for propensity-weighted rank-based metrics which is subdifferentiable and thus suitable for gradient-based optimization.

counterfactual Counterfactual Inference +1

Fairness of Exposure in Rankings

no code implementations20 Feb 2018 Ashudeep Singh, Thorsten Joachims

Rankings are ubiquitous in the online world today.

Fairness

Deep Learning with Logged Bandit Feedback

no code implementations ICLR 2018 Thorsten Joachims, Adith Swaminathan, Maarten de Rijke

We propose a new output layer for deep neural networks that permits the use of logged contextual bandit feedback for training.

counterfactual Object Recognition +1

Effective Evaluation using Logged Bandit Feedback from Multiple Loggers

no code implementations17 Mar 2017 Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims

In this paper, we address the question of how to estimate the performance of a new target policy when we have log data from multiple historic policies.

counterfactual

Large-scale Validation of Counterfactual Learning Methods: A Test-Bed

no code implementations1 Dec 2016 Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, Maarten de Rijke

The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news.

counterfactual Off-policy evaluation +1

Unbiased Comparative Evaluation of Ranking Functions

no code implementations25 Apr 2016 Tobias Schnabel, Adith Swaminathan, Peter Frazier, Thorsten Joachims

Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge.

Unbounded Human Learning: Optimal Scheduling for Spaced Repetition

1 code implementation23 Feb 2016 Siddharth Reddy, Igor Labutov, Siddhartha Banerjee, Thorsten Joachims

Second, we use this memory model to develop a stochastic model for spaced repetition systems.

Scheduling

Latent Skill Embedding for Personalized Lesson Sequence Recommendation

no code implementations23 Feb 2016 Siddharth Reddy, Igor Labutov, Thorsten Joachims

In this paper, we present the Latent Skill Embedding (LSE), a probabilistic model of students and educational content that can be used to recommend personalized sequences of lessons with the goal of helping students prepare for specific assessments.

Collaborative Filtering Recommendation Systems

Recommendations as Treatments: Debiasing Learning and Evaluation

no code implementations17 Feb 2016 Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims

Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself.

Causal Inference Recommendation Systems

Learning Preferences for Manipulation Tasks from Online Coactive Feedback

no code implementations5 Jan 2016 Ashesh Jain, Shikhar Sharma, Thorsten Joachims, Ashutosh Saxena

We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots.

The Self-Normalized Estimator for Counterfactual Learning

no code implementations NeurIPS 2015 Adith Swaminathan, Thorsten Joachims

This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem. In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the actions taken by a historical policy. This makes BLBF algorithms particularly attractive for training online systems (e. g., ad placement, web search, recommendation) using their historical logs. The Counterfactual Risk Minimization (CRM) principle offers a general recipe for designing BLBF algorithms.

counterfactual

Using Shortlists to Support Decision Making and Improve Recommender System Performance

no code implementations26 Oct 2015 Tobias Schnabel, Paul N. Bennett, Susan T. Dumais, Thorsten Joachims

From a machine learning perspective, adding items to the shortlist generates a new implicit feedback signal as a by-product of exploration and decision making which can improve recommendation quality.

Decision Making Movie Recommendation +1

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

no code implementations9 Feb 2015 Adith Swaminathan, Thorsten Joachims

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback.

counterfactual Multi-Label Classification

Reducing Dueling Bandits to Cardinal Bandits

no code implementations14 May 2014 Nir Ailon, Thorsten Joachims, Zohar Karnin

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem.

Multi-Armed Bandits

Methods for Ordinal Peer Grading

no code implementations14 Apr 2014 Karthik Raman, Thorsten Joachims

Thus, in this paper we study the problem of automatically inferring student grades from ordinal peer feedback, as opposed to existing methods that require cardinal peer feedback.

Structured learning of sum-of-submodular higher order energy functions

no code implementations28 Sep 2013 Alexander Fix, Thorsten Joachims, Sam Park, Ramin Zabih

Rather than trying to formulate existing higher order priors as an SoS function, we take a discriminative learning approach, effectively searching the space of SoS functions for a higher order prior that performs well on our training set.

Interactive Segmentation

Learning Trajectory Preferences for Manipulators via Iterative Improvement

no code implementations NeurIPS 2013 Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena

In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks.

Semantic Labeling of 3D Point Clouds for Indoor Scenes

no code implementations NeurIPS 2011 Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84. 06% in labeling 17 object classes for offices, and 73. 38% in labeling 17 object classes for home scenes.

Object

Cannot find the paper you are looking for? You can Submit a new open access paper.