Simulators can provide valuable insights for researchers and practitioners who wish to improve recommender systems, because they allow one to easily tweak the experimental setup in which recommender systems operate, and as a result lower the cost of identifying general trends and uncovering novel findings about the candidate methods.
CSRec contains a teacher module that generates high-quality and confident soft labels and a student module that acts as the target recommender and is trained on the combination of dense, soft labels and sparse, one-hot labels.
We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level.
We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.
To address this limitation, recent studies enable generalization to an unseen target domain with only a few labeled examples using data augmentation techniques.
Prior work on bias mitigation often assumes that ranking scores, which correspond to the utility that a document holds for a user, can be accurately determined.
We put forward a novel Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major contributions to continual learning for GR: (i) To encode new documents into docids with low computational cost, we present Incremental Product Quantization, which updates a partial quantization codebook according to two adaptive thresholds; and (ii) To memorize new documents for querying without forgetting previous knowledge, we propose a memory-augmented learning mechanism, to form meaningful connections between old and new documents.
The AREA task is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model in response to a query.
With group bias, the utility of the sensitive groups is under-estimated, hence, without correcting for this bias, a supposedly fair ranking is not truly fair.
In next basket recommendation (NBR), it is useful to distinguish between repeat items, i. e., items that a user has consumed before, and explore items, i. e., items that a user has not consumed before.
To provide feasible answers to an ambiguous question, one approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity.
In this paper we propose RecFusion, which comprise a set of diffusion models for recommendation.
Successful applications of distributional reinforcement learning with quantile regression prompt a natural question: can we use other statistics to represent the distribution of returns?
Recent advances in tabular question answering (QA) with large language models are constrained in their coverage and only answer questions over a single table.
The QPP task is to predict the retrieval quality of a search system for a query without relevance judgments.
Recent work on knowledge graph completion (KGC) focused on learning embeddings of entities and relations in knowledge graphs.
Recommender systems that learn from implicit feedback often use large volumes of a single type of implicit user feedback, such as clicks, to enhance the prediction of sparse target behavior such as purchases.
We therefore propose an outlier-aware click model that accounts for both outlier and position bias, called outlier-aware position-based model ( OPBM).
In this paper, we focus on a more general type of perturbation and introduce the topic-oriented adversarial ranking attack task against NRMs, which aims to find an imperceptible perturbation that can promote a target document in ranking for a group of queries with the same topic.
Learning task-specific retrievers that return relevant contexts at an appropriate level of semantic granularity, such as a document retriever, passage retriever, sentence retriever, and entity retriever, may help to achieve better performance on the end-to-end task.
For the CLTR field, our novel exposure-based risk minimization method enables practitioners to adopt CLTR methods in a safer manner that mitigates many of the risks attached to previous methods.
We prove that debiasedness is a necessary condition for recovering unbiased and consistent relevance scores and for the invariance of click prediction under covariate shift.
STEAM first corrects an input item sequence by adjusting the misclicked and/or missed items.
Our findings suggest that representation learning using generative models is a promising direction towards generalizable RL-based slate recommendation.
Additionally, we select two scene-centric datasets, and three object-centric datasets, and determine the relative performance of the selected models on these datasets.
To address the limitation of sequential recommenders with side information, we define a way to fuse side information and alleviate the problem of missing side information by proposing a unified task, namely the missing information imputation (MII), which randomly masks some feature fields in a given sequence of items, including item IDs, and then forces a predictive model to recover them.
In this paper, we argue that the paradigm commonly adopted for offline evaluation of sequential recommender systems is unsuitable for evaluating reinforcement learning-based recommenders.
MixCL effectively reduces the hallucination of LMs in conversations and achieves the highest performance among LM-based dialogue agents in terms of relevancy and factuality.
We also propose a dynamic negative sampling strategy to capture the dynamic influence of biases by employing a bias-only model to dynamically select the most similar biased negative samples.
In traditional recommender system literature, diversity is often seen as the opposite of similarity, and typically defined as the distance between identified topics, categories or word models.
A ranking model is said to be Certified Top-$K$ Robust on a ranked list when it is guaranteed to keep documents that are out of the top $K$ away from the top $K$ under any attack.
Using this result, we propose to train two parallel instances of a linear model, initialized with different random seeds, and use their intersection as a signal to detect overfitting.
no code implementations • 6 Jul 2022 • Ana Lucic, Sheeraz Ahmad, Amanda Furtado Brinhosa, Vera Liao, Himani Agrawal, Umang Bhatt, Krishnaram Kenthapadi, Alice Xiang, Maarten de Rijke, Nicholas Drabowski
In this paper, we report on ongoing work regarding (i) the development of an AI system for flagging and explaining low-quality medical images in real-time, (ii) an interview study to understand the explanation needs of stakeholders using the AI system at OurCompany, and, (iii) a longitudinal user study design to examine the effect of including explanations on the workflow of the technicians in our clinics.
To address the above limitations, we propose a Debiasing Learning for Membership Inference Attacks against recommender systems (DL-MIA) framework that has four main components: (1) a difference vector generator, (2) a disentangled encoder, (3) a weight estimator, and (4) an attack model.
We frame inventory restocking as a new reinforcement learning task that exhibits stochastic behavior conditioned on the agent's actions, making the environment partially observable.
In this work, we discuss how to approach fairness of exposure in cases where the policy contains rankings of which, due to inter-item dependencies, we cannot reliably estimate the exposure distribution.
In response to these shortcomings, we reproduce and expand on the existing comparison of attention-based state encoders (1) in the publicly available debiased RL4Rec SOFA simulator with (2) a different RL method, (3) more state encoders, and (4) a different dataset.
The cross-entropy objective has proved to be an all-purpose training objective for autoregressive language models (LMs).
We add an additional decoder to the contrastive ICR framework, to reconstruct the input caption in a latent space of a general-purpose sentence encoder, which prevents the image and caption encoder from suppressing predictive features.
Different from PL, where pointwise logits are used as the distribution parameters, in PPG pairwise inversion probabilities together with a reference permutation construct the distribution.
What is the influence of user experience on the user satisfaction rating of TDS as opposed to, or in addition to, utility?
Motivated from these two angles, we propose a new task: summarization with graphical elements, and we verify that these summaries are helpful for a critical mass of people.
In this work, we study parameter-efficient abstractive QA in encoder-decoder models over structured tabular data and unstructured textual data using only 1. 5% additional parameters for each modality.
We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list.
Employing existing user simulators to evaluate TDSs is challenging as user simulators are primarily designed to optimize dialogue policies for TDSs and have limited evaluation capabilities.
Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning.
We formalize outlierness in a ranking, show that outliers are present in realistic datasets, and present the results of an eye-tracking study, showing that users scanning order and the exposure of items are influenced by the presence of outliers.
One aspect of this data is a category tree that is being used in search and recommendation.
However, these methods require a large number of parameters to be learned, which imposes high memory requirements on the computational resources for training such models.
We theoretically show that in a dynamic scenario in which both the selection bias and user preferences are dynamic, existing debiasing methods are no longer unbiased.
In this work, we explain the setup for a technical, graduate-level course on Fairness, Accountability, Confidentiality, and Transparency in Artificial Intelligence (FACT-AI) at the University of Amsterdam, which teaches FACT-AI concepts through the lens of reproducibility.
(1) there is no dataset with large-scale medical dialogues that covers multiple medical services and contains fine-grained medical labels (i. e., intents, actions, slots, values), and (2) there is no set of established benchmarks for MDSs for multi-domain, multi-service medical dialogues.
We propose a loss function, sigmoidF1, which is an approximation of the F1 score that (1) is smooth and tractable for stochastic gradient descent, (2) naturally approximates a multilabel metric, and (3) estimates label propensities and label counts.
Affine correction (AC) is a generalization of IPS that corrects for position bias and trust bias.
HMCEval casts dialogue evaluation as a sample assignment problem, where we need to decide to assign a sample to a human or a machine for evaluation.
Given an incomplete narrative that specifies a main event and a context, we aim to retrieve news articles that discuss relevant events that would enable the continuation of the narrative.
Conversational Question Simplification (CQS) aims to simplify self-contained questions into conversational ones by incorporating some conversational characteristics, e. g., anaphora and ellipsis.
We seek to improve the performance for both frequent and rare ICD codes by using a contrastive graph-based EHR coding framework, CoGraph, which re-casts EHR coding as a few-shot learning task.
By doing so, the SR model is able to learn how to identify common and unique user preferences, and thereby do better user preference extraction and representation.
We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner.
(2) We release a benchmark dataset, called wizard of search engine (WISE), which allows for comprehensive and in-depth research on all aspects of CIS.
We propose an end-to-end variational reasoning approach to medical dialogue generation.
First, contract elements are far more fine-grained than named entities, which hinders the transfer of extractors.
The purpose of the task is to increase the evaluation power of user simulations and to make the simulation more human-like.
The proposed backward reasoning step pushes the model to produce more informative and coherent content because the forward generation step's output is used to infer the dialogue context in the backward direction.
These datasets were collected to inform different dialogue-based tasks including conversational search.
We find that CoMemNN is able to enrich user profiles effectively, which results in an improvement of 3. 06% in terms of response selection accuracy compared to state-of-the-art methods.
We introduce the Generalization and Specialization (GENSPEC) algorithm, a robust feature-based counterfactual LTR method that pursues per-query memorization when it is safe to do so.
In this work, we propose a method for generating CF explanations for GNNs: the minimal perturbation to the input (graph) data such that the prediction changes.
In this paper, we provide a systematic review of the techniques used in current CRSs.
In this paper, we propose the abstractive opinion tagging task, where systems have to automatically generate a ranked list of opinion tags that are based on, but need not occur in, a given set of user-generated reviews.
In this paper, we focus on purchase prediction for both anonymous and identified sessions on an e-commerce platform.
Motivated by our findings, we present ways to mitigate this mismatch in future research on automatic summarization: we propose research directions that impact the design, the development and the evaluation of automatically generated summaries.
With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios.
One of the key challenges in cross-domain sequential recommendation is to grasp and transfer the flow of information from multiple domains so as to promote recommendations in all domains.
Reinforcement learning methods have emerged as a popular choice for training an efficient and effective dialogue policy.
The proposed SGNN-HN applies a star graph neural network (SGNN) to model the complex transition relationship between items in an ongoing session.
Ranked #1 on Session-Based Recommendations on yoochoose1/64
Then, the traditional multi-label classification solution for dialogue policy learning is extended by adding dense layers to improve the dialogue agent performance.
Our main contribution is a new estimator based on affine corrections: it both reweights clicks and penalizes items displayed on ranks with high trust bias.
In this paper, we define the task of Malevolent Dialogue Response Detection and Classification (MDRDC).
Instead of generating a response from scratch, P2-Net generates system responses by paraphrasing template-based responses.
LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display.
As recent learning to match methods have made important advances in bridging the vocabulary gap for these traditional IR areas, we investigate their potential in the context of product search.
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
Effective optimization is essential for real-world interactive systems to provide a satisfactory user experience in response to changing user behavior.
Unbiased CLTR requires click propensities to compensate for the difference between user clicks and true relevance of search results via IPS.
The ability to engage in mixed-initiative interaction is one of the core requirements for a conversational search system.
Context from the conversational history can be used to arrive at a better expression of the current turn query, defined as the task of query resolution.
Counterfactual Learning to Rank (LTR) algorithms learn a ranking model from logged user interactions, often collected using a production system.
We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking.
We employ a modified self-attention mechanism to estimate item importance in a session, which is then used to predict user's long-term preference.
To support research on entity salience, we present a new dataset, the WikiNews Salience dataset (WN-Salience), which can be used to benchmark tasks such as entity salience detection and salient entity linking.
In this paper, we address the problem of answering complex information needs by conversing conversations with search engines, in the sense that users can express their queries in natural language, and directly receivethe information they need from a short system response in a conversational manner.
Reinforcement Learning (RL) methods have emerged as a popular choice for training an efficient and effective dialogue policy.
We hypothesize that the deeper reason is that in the training corpora, there are hard tokens that are more difficult for a generative model to learn than others and, once learning has finished, hard tokens are still under-learned, so that repetitive generations are more likely to happen.
Our experiments using text classification and document retrieval confirm the above by comparing SEA (and a boundless variant called BSEA) to online and offline learning methods for contextual bandit problems.
Our contributions are three-fold: (1) We first present a survey to understand the space of document-centered assistance and the capabilities people expect in this scenario.
We introduce the bidirectional Scene Text Transformer (Bi-STET), a novel bidirectional STR method with a single decoder for bidirectional text decoding.
Relevance ranking aims at building a ranked list sorted in decreasing order of item relevance, while result diversification focuses on generating a ranked list of items that covers a broad range of topics.
Model interpretability has become an important problem in machine learning (ML) due to the increased effect that algorithmic decisions have on humans.
We propose a novel mixture-of-generators network (MoGNet) for DRG, where we assume that each token of a response is drawn from a mixture of distributions.
Finally, we find that relative positions heads seem integral to summarization performance and persistently remain after pruning.
Given a user, we first obtain a collaborative vector by collecting useful information with a collaborative memory module.
We conclude that existing metrics of disentanglement were created to reflect different characteristics of disentanglement and do not satisfy two basic desirable properties: (1) assign a high score to representations that are disentangled according to the definition; and (2) assign a low score to representations that are entangled according to the definition.
We study sequential recommendation in a particularly challenging context, in which multiple individual users share asingle account (i. e., they have a shared account) and in which user behavior is available in multiple domains (i. e., recommendations are cross-domain).
We introduce the concept of interaction and propose a two-perspective interaction representation, that encapsulates a local and a global interaction representation.
Then, we design an Intent-aware Diversity Promoting (IDP) loss to supervise the learning of the IIM module and force the model to take recommendation diversity into consideration during training.
Given a conversational context and background knowledge, we first learn a topic transition vector to encode the most likely text fragments to be used in the next response, which is then used to guide the local KS at each decoding timestamp.
FARM improves visual understanding by incorporating the supervision of generation loss, which we hypothesize to be able to better encode aesthetic information.
We propose a novel approach for complex KGQA that uses unsupervised message passing, which propagates confidence scores obtained by parsing an input question and matching terms in the knowledge graph to a set of possible answers.
Given a large error, MC-BRP determines (1) feature values that would result in a reasonable prediction, and (2) general trends between each feature and the target, both based on Monte Carlo simulations.
Through randomization the effect of different types of bias can be removed from the learning process.
At the moment, two methodologies for dealing with bias prevail in the field of LTR: counterfactual methods that learn from historical data and model user behavior to deal with biases; and online methods that perform interventions to deal with bias but use no explicit user models.
The proceedings list for the program of FACTS-IR 2019, the Workshop on Fairness, Accountability, Confidentiality, Transparency, and Safety in Information Retrieval held at SIGIR 2019.
We propose a neural Modular Task-oriented Dialogue System(MTDS) framework, in which a few expert bots are combined to generate the response for a given dialogue context.
Deep feature learning extracts feature representations of users and items with a deep learning architecture based on a user-item rating matrix.
Understanding how "black-box" models arrive at their predictions has sparked significant interest from both within and outside the AI community.
We investigate whether distributions calculated by different attention heads in a transformer architecture can be used to improve transparency in the task of abstractive summarization.
Obtaining key information from a complex, long dialogue context is challenging, especially when different sources of information are available, e. g., the user's utterances, the system's responses, and results retrieved from a knowledge base (KB).
We consider the problem of identifying the K most attractive items and propose cascading non-stationary bandits, an online learning variant of the cascading model, where a user browses a ranked list from top to bottom and clicks on the first attractive item.
The visual appearance of a webpage carries valuable information about its quality and can be used to improve the performance of learning to rank (LTR).
Specifically, we first analyze the influence of the commonly used Cross-Entropy (CE) loss function, and find that the CE loss function prefers high-frequency tokens, which results in low-diversity responses.
Our main finding is that for large-scale Condorcet ranker evaluation problems, MergeDTS outperforms the state-of-the-art dueling bandit algorithms.
The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator.
RepeatNet integrates a regular neural recommendation approach in the decoder with a new repeat recommendation mechanism that can choose items from a user's history and recommends them at the right time.
For measuring topical diversity of text documents, our HiTR approach improves over the state-of-the-art measured on PubMed dataset.
Sequence-to-sequence (Seq2Seq) models have been shown to be very effective for response generation.
The generated comments can be regarded as explanations for the recommendation results.
Ordinal Regression (OR) aims to model the ordering information between different data categories, which is a crucial topic in multi-label learning.
Conversational systems have become increasingly popular as a way for humans to interact with computers.
In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.
We first present an annotation study, and based on our observations propose a formal task definition and annotation procedure for creating benchmark datasets for suggestion mining.
Existing learning to rank methods cannot handle such complex ranking settings as they assume that the display order is known beforehand.
KG fact contextualization is the task of augmenting a given KG fact with additional and useful KG facts.
We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e. g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT).
We propose a new output layer for deep neural networks that permits the use of logged contextual bandit feedback for training.
We argue that the process of building a representation of the conversation can be framed as a machine reading task, where an automated system is presented with a number of statements about which it should answer questions.
In this paper we investigate the affordances of interactive storytelling as a tool to enable exploratory search within the framework of a conversational interface.
We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval.
We discover how clusterings of experts correspond to committees in organizations, the ability of expert representations to encode the co-author graph, and the degree to which they encode academic rank.
Deep neural networks have become a primary tool for solving problems in many fields.
Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them.
We do so by tracking entities that emerge in public discourse, that is, in online text streams such as social media and news streams, before they are incorporated into Wikipedia, which, we argue, can be viewed as an online place for collective memory.
The ability to perform effective off-policy learning would revolutionize the process of building better interactive systems, such as search engines and recommendation systems for e-commerce, computational advertising and news.
We detail the required background and terminology, a taxonomy grouping the rapidly growing body of work in the area, and then survey work on neural models for semantic matching in the context of three tasks: query suggestion, ad retrieval, and document retrieval.
The proposed method, called probabilistic feature selection and classification vector machine (PFCVMLP ), is able to simultaneously select relevant features and samples for classification tasks.
In this paper we propose a model of user behavior on a SERP that jointly captures click behavior, user attention and satisfaction, the CAS model, and demonstrate that it gives more accurate predictions of user actions and self-reported satisfaction than existing models based on clicks alone.
We introduce a novel latent vector space model that jointly learns the latent representations of words, e-commerce products and a mapping between the two without the need for explicit annotations.
We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches.
We present the Siamese Continuous Bag of Words (Siamese CBOW) model, a neural network for efficient estimation of high-quality sentence embeddings.
Topic models such as Latent Dirichlet Allocation (LDA) have been widely used in information retrieval for tasks ranging from smoothing and feedback methods to tools for exploratory search and discovery.