We further instantiate the proposed unbiased relevance estimation framework in Baidu search, with comprehensive practical solutions designed regarding the data pipeline for click behavior tracking and online relevance estimation with an approximated deep neural network.
To leverage a reliable knowledge, we propose a novel knowledge graph distillation method and obtain a knowledge meta graph as the bridge between query and passage.
In other words, GE-BERT can capture both the semantic information and the users' search behavioral information of queries.
More importantly, we present a practical system workflow for deploying the model in web-scale retrieval.
Based on it, a more robust doubly robust (MRDR) estimator has been proposed to further reduce its variance while retaining its double robustness.
However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system due to the following challenging issues:(1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web-document, prohibit their deployments in an online ranking system that demands extremely low latency;(2) the discrepancy between existing ranking-agnostic pre-training objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system;(3) a real-world search engine typically involves a committee of ranking components, and thus the compatibility of the individually fine-tuned ranking model is critical for a cooperative ranking system.
Early methods mainly fall into two paradigms with certain benefits and drawbacks: (1)Greedy algorithms, selecting seed nodes one by one, give a guaranteed accuracy relying on the accurate approximation of influence spread with high computational cost; (2)Heuristic algorithms, estimating influence spread using efficient heuristics, have low computational cost but unstable accuracy.
Social and Information Networks Data Structures and Algorithms F.2.2; D.2.8
We point out that the essential reason of the dilemma is the surprising fact that the submodularity, a key requirement of the objective function for a greedy algorithm to approximate the optimum, is not guaranteed in all conventional greedy algorithms in the literature of influence maximization.
Social and Information Networks Data Structures and Algorithms Physics and Society F.2.2; D.2.8