2 code implementations • 25 Apr 2024 • Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models.
no code implementations • 23 Feb 2024 • Joyce Zhou, Yijia Dai, Thorsten Joachims
The encoder LLM generates a compact natural-language profile of the user's interests from the user's rating history.
1 code implementation • 16 Feb 2024 • Zhaolin Gao, Kianté Brantley, Thorsten Joachims
In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft.
no code implementations • 9 Feb 2024 • Yuta Saito, Jihan Yao, Thorsten Joachims
We also show that POTEC provides a strict generalization of policy- and regression-based approaches and their associated assumptions.
no code implementations • 3 Nov 2023 • Matej Jakimov, Alexander Buchholz, Yannik Stein, Thorsten Joachims
For industrial learning-to-rank (LTR) systems, it is common that the output of a ranking model is modified, either as a results of post-processing logic that enforces business requirements, or as a result of unforeseen design flaws or bugs present in real-world production systems.
no code implementations • 27 Oct 2023 • Wentao Guo, Andrew Wang, Bradon Thymes, Thorsten Joachims
We introduce the problem of ranking with slot constraints, which can be used to model a wide range of application problems -- from college admission with limited slots for different majors, to composing a stratified cohort of eligible participants in a medical trial.
no code implementations • 16 Sep 2023 • Joyce Zhou, Thorsten Joachims
In this work, we establish a baseline potential for how modern model-generated text explanations of movie recommendations may help users, and explore what different components of these text explanations that users like or dislike, especially in contrast to existing human movie reviews.
1 code implementation • 4 Sep 2023 • Richa Rastogi, Thorsten Joachims
While ranking can make human evaluation more effective by focusing attention on the most promising options, we argue that it can introduce unfairness if the uncertainty of the underlying relevance model differs between groups of options.
1 code implementation • 10 Jul 2023 • Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims
The feedback that users provide through their choices (e. g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms.
no code implementations • 30 Jun 2023 • Jinsook Lee, Bradon Thymes, Joyce Zhou, Thorsten Joachims, Rene F. Kizilcec
University admission at many highly selective institutions uses a holistic review process, where all aspects of the application, including protected attributes (e. g., race, gender), grades, essays, and recommendation letters are considered, to compose an excellent and diverse class.
no code implementations • 14 May 2023 • Yuta Saito, Qingyang Ren, Thorsten Joachims
To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect.
no code implementations • 15 Oct 2022 • Alexander Buchholz, Ben London, Giuseppe Di Benedetto, Thorsten Joachims
A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production.
no code implementations • 1 Aug 2022 • Ben London, Levi Lu, Ted Sandler, Thorsten Joachims
We propose the first boosting algorithm for off-policy learning from logged bandit feedback.
1 code implementation • 15 Jun 2022 • Yuta Saito, Thorsten Joachims
Our axioms of envy-freeness and dominance over uniform ranking postulate that for a fair ranking policy every item should prefer their own rank allocation over that of any other item, and that no item should be actively disadvantaged by the rankings.
1 code implementation • 30 May 2022 • Lequn Wang, Thorsten Joachims
To this end, motivated by recent advances in uncertainty quantification, we propose two threshold-policy selection rules that can provide distribution-free and finite-sample guarantees on fairness in first-stage recommenders.
no code implementations • 22 Apr 2022 • Adam Block, Rahul Kidambi, Daniel N. Hill, Thorsten Joachims, Inderjit S. Dhillon
A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions.
3 code implementations • 13 Feb 2022 • Yuta Saito, Thorsten Joachims
Unfortunately, when the number of actions is large, existing OPE estimators -- most of which are based on inverse propensity score weighting -- degrade severely and can suffer from extreme bias and variance.
no code implementations • 3 Feb 2022 • Aaron David Tucker, Thorsten Joachims
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems, since they allow efficient reuse of existing log data.
1 code implementation • 2 Feb 2022 • Lequn Wang, Thorsten Joachims, Manuel Gomez Rodriguez
Many selection processes such as finding patients qualifying for a medical trial or retrieval pipelines in search engines consist of multiple stages, where an initial screening stage focuses the resources on shortlisting the most promising candidates.
1 code implementation • NeurIPS 2021 • Ashudeep Singh, David Kempe, Thorsten Joachims
We call an algorithm $\phi$-fair (for $\phi \in [0, 1]$) if it has the following property for all agents $x$ and all $k$: if agent $x$ is among the top $k$ agents with respect to merit with probability at least $\rho$ (according to the posterior merit distribution), then the algorithm places the agent among the top $k$ agents in its ranking with probability at least $\phi \rho$.
no code implementations • 3 Jun 2021 • Yi Su, Magd Bayoumi, Thorsten Joachims
Based on the success of recommender systems in e-commerce, there is growing interest in their use in matching markets (e. g., labor).
no code implementations • 3 Mar 2021 • Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims
Contextual bandit algorithms have become widely used for recommendation in online systems (e. g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed to the users.
no code implementations • NeurIPS 2020 • Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims
In this work, we present MOReL, an algorithmic framework for model-based offline RL.
no code implementations • 4 Oct 2020 • Lequn Wang, Thorsten Joachims
The algorithm optimizes user and item fairness as a convex optimization problem which can be solved optimally.
no code implementations • 11 Aug 2020 • Joseph Cleveland, Derek Cheng, Michael Zhou, Thorsten Joachims, Douglas Turnbull
We compare two models that are trained using different ways of picking this third song: at random vs. based on shared genre labels.
1 code implementation • 16 Jun 2020 • Noveen Sachdeva, Yi Su, Thorsten Joachims
Learning effective contextual-bandit policies from past actions of a deployed system is highly desirable in many settings (e. g. voice assistants, recommendation, search), since it enables the reuse of large amounts of log data.
1 code implementation • 29 May 2020 • Marco Morik, Ashudeep Singh, Jessica Hong, Thorsten Joachims
Rankings are the primary interface through which many online platforms match users to items (e. g. news, products, music, video).
2 code implementations • 12 May 2020 • Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims
In this work, we present MOReL, an algorithmic framework for model-based offline RL.
1 code implementation • 19 Nov 2019 • Himank Yadav, Zhengxiao Du, Thorsten Joachims
is an abundant and attractive source of data for learning to rank, it can produce unfair ranking policies for both exogenous and endogenous reasons.
1 code implementation • NeurIPS 2019 • Ashudeep Singh, Thorsten Joachims
Conventional Learning-to-Rank (LTR) methods optimize the utility of the rankings to the users, but they are oblivious to their impact on the ranked items.
no code implementations • 6 Nov 2018 • Yi Su, Lequn Wang, Michele Santacatterina, Thorsten Joachims
In addition, it is sub-differentiable such that it can be used for learning, unlike the SWITCH estimator.
no code implementations • 5 Nov 2018 • Zhichong Fang, Aman Agarwal, Thorsten Joachims
To overcome this limitation, we propose a Contextual Position-Based Model (CPBM) where the examination bias may also depend on a context vector describing the query and the user.
no code implementations • 9 Jun 2018 • Aman Agarwal, Ivan Zaitsev, Thorsten Joachims
Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal with uninformative signals due to position in the ranking, saliency, and other presentation factors.
no code implementations • 30 Apr 2018 • Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, Thorsten Joachims
Specifically, we derive a relaxation for propensity-weighted rank-based metrics which is subdifferentiable and thus suitable for gradient-based optimization.
no code implementations • 20 Feb 2018 • Ashudeep Singh, Thorsten Joachims
Rankings are ubiquitous in the online world today.
no code implementations • ICLR 2018 • Thorsten Joachims, Adith Swaminathan, Maarten de Rijke
We propose a new output layer for deep neural networks that permits the use of logged contextual bandit feedback for training.
no code implementations • 17 Mar 2017 • Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims
In this paper, we address the question of how to estimate the performance of a new target policy when we have log data from multiple historic policies.
no code implementations • 1 Dec 2016 • Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, Maarten de Rijke
The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news.
no code implementations • 16 Aug 2016 • Thorsten Joachims, Adith Swaminathan, Tobias Schnabel
Implicit feedback (e. g., clicks, dwell times, etc.)
no code implementations • 25 Apr 2016 • Tobias Schnabel, Adith Swaminathan, Peter Frazier, Thorsten Joachims
Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge.
1 code implementation • 23 Feb 2016 • Siddharth Reddy, Igor Labutov, Siddhartha Banerjee, Thorsten Joachims
Second, we use this memory model to develop a stochastic model for spaced repetition systems.
no code implementations • 23 Feb 2016 • Siddharth Reddy, Igor Labutov, Thorsten Joachims
In this paper, we present the Latent Skill Embedding (LSE), a probabilistic model of students and educational content that can be used to recommend personalized sequences of lessons with the goal of helping students prepare for specific assessments.
no code implementations • 17 Feb 2016 • Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims
Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself.
no code implementations • 5 Jan 2016 • Ashesh Jain, Shikhar Sharma, Thorsten Joachims, Ashutosh Saxena
We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots.
no code implementations • NeurIPS 2015 • Adith Swaminathan, Thorsten Joachims
This paper identifies a severe problem of the counterfactual risk estimator typically used in batch learning from logged bandit feedback (BLBF), and proposes the use of an alternative estimator that avoids this problem. In the BLBF setting, the learner does not receive full-information feedback like in supervised learning, but observes feedback only for the actions taken by a historical policy. This makes BLBF algorithms particularly attractive for training online systems (e. g., ad placement, web search, recommendation) using their historical logs. The Counterfactual Risk Minimization (CRM) principle offers a general recipe for designing BLBF algorithms.
no code implementations • 26 Oct 2015 • Tobias Schnabel, Paul N. Bennett, Susan T. Dumais, Thorsten Joachims
From a machine learning perspective, adding items to the shortlist generates a new implicit feedback signal as a by-product of exploration and decision making which can improve recommendation quality.
no code implementations • 9 Feb 2015 • Adith Swaminathan, Thorsten Joachims
We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback.
no code implementations • 14 May 2014 • Nir Ailon, Thorsten Joachims, Zohar Karnin
We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem.
no code implementations • 14 Apr 2014 • Karthik Raman, Thorsten Joachims
Thus, in this paper we study the problem of automatically inferring student grades from ordinal peer feedback, as opposed to existing methods that require cardinal peer feedback.
no code implementations • 28 Sep 2013 • Alexander Fix, Thorsten Joachims, Sam Park, Ramin Zabih
Rather than trying to formulate existing higher order priors as an SoS function, we take a discriminative learning approach, effectively searching the space of SoS functions for a higher order prior that performs well on our training set.
no code implementations • NeurIPS 2013 • Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena
In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks.
no code implementations • NeurIPS 2011 • Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena
In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84. 06% in labeling 17 object classes for offices, and 73. 38% in labeling 17 object classes for home scenes.
no code implementations • International Conference on Machine Learning 2008 • Filip Radlinski, Robert Kleinberg, Thorsten Joachims
Algorithms for learning to rank Web documents usually assume a document's relevance is independent of other documents.