We compute a title-body matching score based on the representations of title and body enhanced by their interactions.
In addition, we propose an auxiliary term classification task to predict the types of the matched entity names, and jointly train it with the NER model to fuse both contexts and dictionary knowledge into NER.
Federated learning is used to train a shared model in a decentralized way without clients sharing private data with each other.
Evaluations of real-world scenarios across multiple datasets show that the proposed method enhances the robustness of federated learning against model poisoning attacks.
In this paper, instead of client uniform sampling, we propose a novel data uniform sampling strategy for federated learning (FedSampling), which can effectively improve the performance of federated learning especially when client data size distribution is highly imbalanced across clients.
Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers.
While federated learning is promising for privacy-preserving collaborative learning without revealing local data, it remains vulnerable to white-box attacks and struggles to adapt to heterogeneous clients.
Algorithmic fairness has become an important machine learning problem, especially for mission-critical Web applications.
In this work, we found that the backdoor attack can construct an artificial bias similar to the model bias derived in standard training.
To address these issues, we propose the REMI framework, consisting of an Interest-aware Hard Negative mining strategy (IHN) and a Routing Regularization (RR) method.
In order to address this issue, we propose GAS, a \shorten approach that can successfully adapt existing robust AGRs to non-IID settings.
Federated learning has recently been applied to recommendation systems to protect user privacy.
Since at each round, the number of tunable parameters optimized on the server side equals the number of participating clients (thus independent of the model size), we are able to train a global model with massive parameters using only a small amount of proxy data (e. g., around one hundred samples).
It removes a user's contribution by rolling back and calibrating the historical parameter updates and then uses these updates to speed up federated recommender reconstruction.
In this paper, we propose an effective query-aware webpage snippet extraction method named DeepQSE, aiming to select a few sentences which can best summarize the webpage content in the context of input query.
This paper presents FedX, an unsupervised federated learning framework.
We consider the problem of personalised news recommendation where each user consumes news in a sequential fashion.
In order to learn a fair unified representation, we send it to each platform storing fairness-sensitive features and apply adversarial learning to remove bias from the unified representation inherited from the biased data.
Federated learning (FL) enables multiple clients to collaboratively train models without sharing their local data, and becomes an important privacy-preserving machine learning framework.
In this paper, we propose a federated contrastive learning method named FedCL for privacy-preserving recommendation, which can exploit high-quality negative samples for effective model training with privacy well protected.
The core idea of FUM is to concatenate the clicked news into a long document and transform user modeling into a document modeling task with both intra-news and inter-news word-level interactions.
To learn provider-fair representations from biased data, we employ provider-biased representations to inherit provider bias from data.
Existing methods for news recommendation usually model user interest from historical clicked news without the consideration of candidate news.
In this paper, we propose a semi-supervised fair representation learning approach based on adversarial variational autoencoder, which can reduce the dependency of adversarial fair models on data with labeled sensitive attributes.
In addition, we weight the distillation loss based on the overall prediction correctness of the teacher ensemble to distill high-quality knowledge.
Different from existing news recommendation methods that are usually based on point- or pair-wise ranking, in LeaDivRec we propose a more effective list-wise news recommendation model.
Since candidate news selection can be biased, we propose to use a shared candidate-aware user model to match user interest with a real displayed candidate news and a random news, respectively, to learn a candidate-aware user embedding that reflects user interest in candidate news and a candidate-invariant user embedding that indicates intrinsic user interest.
They are usually learned on historical user behavior data to infer user interest and predict future user behaviors (e. g., clicks).
In this paper, we propose a quality-aware news recommendation method named QualityRec that can effectively improve the quality of recommended news.
In this paper, we propose a very simple yet effective method named NoisyTune to help better finetune PLMs on downstream tasks by adding some noise to the parameters of PLMs before fine-tuning.
In this way, all the clients can participate in the model learning in FL, and the final model can be big and powerful enough.
Our study reveals a critical security issue in existing federated news recommendation systems and calls for research efforts to address the issue.
However, existing general FL poisoning methods for degrading model performance are either ineffective or not concealed in poisoning federated recommender systems.
To solve the game, we propose a platform negotiation method that simulates the bargaining among platforms and locally optimizes their policies via gradient descent.
Nowadays, due to the breakthrough in natural language generation (NLG), including machine translation, document summarization, image captioning, etc NLG models have been encapsulated in cloud APIs to serve over half a billion people worldwide and process over one hundred billion word generations per day.
We further propose a two-stage knowledge distillation method to improve the efficiency of the large PLM-based news recommendation model while maintaining its performance.
However, the computation and communication cost of directly learning many existing news recommendation models in a federated way are unacceptable for user clients.
In this paper, we propose a unified news recommendation framework, which can utilize user data locally stored in user clients to train models and serve users in a privacy-preserving way.
Two self-supervision tasks are incorporated in UserBERT for user model pre-training on unlabeled user behavior data to empower user modeling.
Instead of directly communicating the large models between clients and server, we propose an adaptive mutual distillation framework to reciprocally learn a student and a teacher model on each client, where only the student model is shared by different clients and updated collaboratively to reduce the communication cost.
In this way, Fastformer can achieve effective context modeling with linear complexity.
Ranked #1 on News Recommendation on MIND (using extra training data)
We then sample token pairs based on their probability scores derived from the sketched attention matrix to generate different sparse attention index matrices for different attention heads.
News recommendation is often modeled as a sequential recommendation task, which assumes that there are rich short-term dependencies over historical clicked news.
Instead of following the conventional taxonomy of news recommendation methods, in this paper we propose a novel perspective to understand personalized news recommendation based on its core problems and the associated techniques and challenges.
It is important to eliminate the effect of position biases on the recommendation model to accurately target user interests.
Instead of a single user embedding, in our method each user is represented in a hierarchical interest tree to better capture their diverse and multi-grained interest in news.
It can effectively reduce the complexity and meanwhile capture global document context in the modeling of each sentence.
In addition, we propose a multi-teacher hidden loss and a multi-teacher distillation loss to transfer the useful knowledge in both hidden states and soft labels from multiple teacher PLMs to the student model.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
In this work, we bridge this gap by first presenting an effective model extraction attack, where the adversary can practically steal a BERT-based API (the target/victim model) by only querying a limited number of queries.
Our method interactively models candidate news and user interest to facilitate their accurate matching.
The core of our method includes a bias representation module, a bias-aware user modeling module, and a bias-aware click prediction module.
Recall and ranking are two critical steps in personalized news recommendation.
Most of existing news representation methods learn news representations only from news texts while ignore the visual information in news like images.
Our PLM-empowered news recommendation models have been deployed to the Microsoft News platform, and achieved significant gains in terms of both click and pageview in both English-speaking and global markets.
Besides, the feed recommendation models trained solely on click behaviors cannot optimize other objectives such as user engagement.
However, existing language models are pre-trained and distilled on general corpus like Wikipedia, which has some gaps with the news domain and may be suboptimal for news intelligence.
To incorporate high-order user-item interactions, we propose a user-item graph expansion method that can find neighboring users with co-interacted items and exchange their embeddings for expanding the local user-item graphs in a privacy-preserving way.
The dwell time of news reading is an important clue for user interest modeling, since short reading dwell time usually indicates low and even negative interest.
We learn user representations from browsed news representations, and compute click scores based on user and candidate news representations.
Since the raw weighted real distances may not be optimal for adjusting self-attention weights, we propose a learnable sigmoid function to map them into re-scaled coefficients that have proper ranges.
We propose a query-value interaction function which can learn query-aware attention values, and combine them with the original values and attention weights to form the final output.
Motivated by pre-trained language models which are pre-trained on large-scale unlabeled corpus to empower many downstream tasks, in this paper we propose to pre-train user models from large-scale unlabeled user behaviors data.
In this paper, we propose a multi-task neural network to perform emotion-cause pair extraction in a unified model.
Ranked #11 on Emotion-Cause Pair Extraction on ECPE
On each platform a local user model is used to learn user embeddings from the local user behaviors on that platform.
News recommendation is an important technique for personalized news service.
Different from existing pooling methods that use a fixed pooling norm, we propose to learn the norm in an end-to-end manner to automatically find the optimal ones for text representation in different tasks.
Existing studies generally represent each user as a single vector and then match the candidate news vector, which may lose fine-grained information for recommendation.
In this paper, we propose a fairness-aware news recommendation approach with decomposed adversarial learning and orthogonality regularization, which can alleviate unfairness in news recommendation brought by the biases of sensitive user attributes.
Existing news recommendation methods achieve personalization by building accurate news representations from news content and user representations from their direct interactions with news (e. g., click), while ignoring the high-order relatedness between users and news.
Extensive experiments on a real-world dataset show the effectiveness of our method in news recommendation model training with privacy protection.
Since the labeled data in different platforms usually has some differences in entity type and annotation criteria, instead of constraining different platforms to share the same model, we decompose the medical NER model in each platform into a shared module and a private module.
In the user representation module, we propose an attentive multi-view learning framework to learn unified representations of users from their heterogeneous behaviors such as search queries, clicked news and browsed webpages.
In the review content-view, we propose to use a hierarchical model to first learn sentence representations from words, then learn review representations from sentences, and finally learn user/item representations from reviews.
Since different words and different news articles may have different informativeness for representing news and users, we propose to apply both word- and news-level attention mechanism to help our model attend to important words and news articles.
In the user encoder we learn the representations of users based on their browsed news and apply attention mechanism to select informative news for user representation learning.
Ranked #6 on News Recommendation on MIND
In this paper, we propose a neural news recommendation approach which can learn both long- and short-term user representations.
Ranked #7 on News Recommendation on MIND
Aspect term extraction (ATE) aims at identifying all aspect terms in a sentence and is usually modeled as a sequence labeling problem.
Ranked #1 on Term Extraction on SemEval 2014 Task 4 Laptop
In this paper, we propose a hierarchical user and item representation model with three-tier attention to learn user and item representations from reviews for recommendation.
In this paper, we propose a hierarchical attention model fusing latent factor model for rating prediction with reviews, which can focus on important words and informative reviews.
In this paper we propose a neural recommendation approach with personalized attention to learn personalized representations of users and items from reviews.
Luckily, the unlabeled data is usually easy to collect and many high-quality Chinese lexicons are off-the-shelf, both of which can provide useful information for CWS.
Besides, the training data for CNER in many domains is usually insufficient, and annotating enough training data for CNER is very expensive and time-consuming.
And then, to put both triples and mined logic rules within the same semantic space, all triples in the knowledge graph are represented as first-order logic.
Recently deep neural networks have been successfully used for various classification tasks, especially for problems with massive perfectly labeled training data.
This paper describes our system for the first and third shared tasks of the third Social Media Mining for Health Applications (SMM4H) workshop, which aims to detect the tweets mentioning drug names and adverse drug reactions.
The experimental results on two benchmark datasets validate that our approach can effectively improve the performance of Chinese word segmentation, especially when training data is insufficient.
In addition, we compare the performance of the softmax classifier and conditional random field (CRF) for sequential labeling in this task.
In order to address this task, we propose a system based on an attention CNN-LSTM model.
Thus, the aim of SemEval-2018 Task 10 is to predict whether a word is a discriminative attribute between two concepts.
Detecting irony is an important task to mine fine-grained information from social web messages.
Thus, in SemEval-2018 Task 2 an interesting and challenging task is proposed, i. e., predicting which emojis are evoked by text-based tweets.
Since the existing valence-arousal resources of Chinese are mainly in word-level and there is a lack of phrase-level ones, the Dimensional Sentiment Analysis for Chinese Phrases (DSAP) task aims to predict the valence-arousal ratings for Chinese affective words and phrases automatically.
Instead of the source domain sentiment classifiers, our approach adapts the general-purpose sentiment lexicons to target domain with the help of a small number of labeled samples which are selected and annotated in an active learning mode, as well as the domain-specific sentiment similarities among words mined from unlabeled samples of target domain.