Specifically, a meta network fed with users' characteristic embeddings is learned to generate personalized bridge functions to achieve personalized transfer of preferences for each user.
A long-standing issue with paraphrase generation is how to obtain reliable supervision signals.
The delayed feedback problem is one of the imperative challenges in online advertising, which is caused by the highly diversified feedback delay of a conversion varying from a few minutes to several days.
To build up a benchmark for this problem, we publicize a large-scale dataset named PENS (PErsonalized News headlineS).
The proposed method is efficient as it can make decisions on-the-fly by utilizing only one randomly chosen model, but is also effective as we show that it can be viewed as a non-Bayesian approximation of Thompson sampling.
We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation.
Recent pretraining models in Chinese neglect two important aspects specific to the Chinese language: glyph and pinyin, which carry significant syntax and semantic information for language understanding.
The adaptation can be achieved easily with most feed-forward network models by extending them with LMMD loss, which can be trained efficiently via back-propagation.
The encoder can automatically construct the population graph using phenotypic measures which have a positive impact on the final results, and further realizes the fusion of multimodal information.
With the advantage of meta learning which has good generalization ability to novel tasks, we propose a transfer-meta framework for CDR (TMCDR) which has a transfer stage and a meta stage.
Typical high quality text-to-speech (TTS) systems today use a two-stage architecture, with a spectrum model stage that generates spectral frames and a vocoder stage that generates the actual audio.
However, in real-world applications, few-shot learning paradigm often suffers from data shift, i. e., samples in different tasks, even in the same task, could be drawn from various data distributions.
More efficient variants of FBWave can achieve up to 109x fewer MACs while still delivering acceptable audio quality.
In complex and noisy settings, model-based RL tends to have trouble using the model if it does not know when to trust the model.
In this paper, we propose the Dual Importance-aware Factorization Machines (DIFM), which exploits the internal field information among users' behavior sequence from dual perspectives, i. e., field value variations and field interactions simultaneously for fraud detection.
In this paper, we propose a Graph Factorization Machine (GFM) which utilizes the popular Factorization Machine to aggregate multi-order interactions from neighborhood for recommendation.
On the one hand, we investigate the proposed algorithms by focusing on how the papers utilize the knowledge graph for accurate and explainable recommendation.
The transfer learning toolkit wraps the codes of 17 transfer learning models and provides integrated interfaces, allowing users to use those models by calling a simple function.
In order to show the performance of different transfer learning models, over twenty representative transfer learning models are used for experiments.
In this work, we re-examine the problem of extractive text summarization for long documents.
Existing multi-view learning methods based on kernel function either require the user to select and tune a single predefined kernel or have to compute and store many Gram matrices to perform multiple kernel learning.
In real world machine learning applications, testing data may contain some meaningful new categories that have not been seen in labeled training data.
Detecting inaccurate smart meters and targeting them for replacement can save significant resources.
It is often observed that the probabilistic predictions given by a machine learning model can disagree with averaged actual outcomes on specific subsets of data, which is also known as the issue of miscalibration.
To enrich the generated responses, ARM introduces a large number of molecule-mechanisms as various responding styles, which are conducted by taking different combinations from a few atom-mechanisms.
We propose Meta-Embedding, a meta-learning-based approach that learns to generate desirable initial embeddings for new ad IDs.
Medical image segmentation has become an essential technique in clinical and research-oriented applications.
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games.
To this end, in this paper, we extend existing KGE models TransE, TransH and DistMult, to learn knowledge representations by leveraging the information from the HRS.
Additionally, a "low-level sharing, high-level splitting" structure of CNN is designed to handle the documents from different content domains.
An effective technique for filtering free-rider episodes is using a partition model to divide an episode into two consecutive subepisodes and comparing the observed support of such episode with its expected support under the assumption that these two subepisodes occur independently.
We evaluate PGCR on toy datasets as well as a real-world dataset of personalized music recommendations.
Then, with a proposed tree-structured search method, the model is able to generate the most probable responses in the form of dependency trees, which are finally flattened into sequences as the system output.