Based on the high level architecture, we then describe a concrete implementation of Baihe for PostgreSQL and present example use cases for learned query optimizers.
Cardinality estimation (CardEst), a central component of the query optimizer, plays a significant role in generating high-quality query plans in DBMS.
1 code implementation • 13 Sep 2021 • Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, Bin Cui
Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods.
It is well known that side information, such as the prior distribution of arm means in Thompson sampling, can improve the statistical efficiency of the bandit algorithm.
Owing to the difficulties of mining spatial-temporal cues, the existing approaches for video salient object detection (VSOD) are limited in understanding complex and noisy scenarios, and often fail in inferring prominent objects.
Bayesian exploration strategies like Thompson Sampling resolve this trade-off in a principled way by modeling and updating the distribution of the parameters of the action-value function, the outcome model of the environment.
We propose to explore the transferabilities of the ML methods both across tasks and across DBs to tackle these fundamental drawbacks.
Off-policy learning algorithms, in which an agent updates the value function of the optimal policy while selecting actions using an independent exploration policy, provide an effective solution to the explore-exploit tradeoff and have proven to be of great practical value in reinforcement learning.
Recently proposed deep learning based methods largely improve the estimation accuracy but their performance can be greatly affected by data and often difficult for system deployment.
To resolve this, we propose a new structure learning algorithm LEAST, which comprehensively fulfills our business requirements as it attains high accuracy, efficiency and scalability at the same time.
The Q-learning algorithm is known to be affected by the maximization bias, i. e. the systematic overestimation of action values, an important issue that has recently received renewed attention.
We introduce factorize sum split product networks (FSPNs), a new class of probabilistic graphical models (PGMs).
Despite decades of research, existing methods either over simplify the models only using independent factorization which leads to inaccurate estimates, or over complicate them by lossless conditional factorization without any independent assumption which results in slow probability computation.
We apply a reinforcement learning (RL) based approach to learning optimal synchronization policies used for Parameter Server-based distributed training of machine learning models with Stochastic Gradient Descent (SGD).
An increasing number of machine learning tasks require dealing with large graph datasets, which capture rich and complex relationship among potentially billions of elements.
Distributed, Parallel, and Cluster Computing
1 code implementation • 17 Jul 2018 • E. Kelly Buchanan, Ian Kinsella, Ding Zhou, Rong Zhu, Pengcheng Zhou, Felipe Gerhard, John Ferrante, Ying Ma, Sharon Kim, Mohammed Shaik, Yajie Liang, Rongwen Lu, Jacob Reimer, Paul Fahey, Taliah Muhammad, Graham Dempsey, Elizabeth Hillman, Na Ji, Andreas Tolias, Liam Paninski
Calcium imaging has revolutionized systems neuroscience, providing the ability to image large neural populations with single-cell resolution.
For optimization on large-scale data, exactly calculating its solution may be computationally difficulty because of the large size of the data.
In modern data analysis, random sampling is an efficient and widely-used strategy to overcome the computational difficulties brought by large sample size.
For unweighted estimation algorithm, we show that its resulting subsample estimator is not consistent to the full sample OLS estimator.
Subsampling is one of efficient strategies to handle this problem.