no code implementations • 23 Nov 2022 • Tingting Zhao, Ying Wang, Wei Sun, Yarui Chen, Gang Niub, Masashi Sugiyama
Meanwhile, we divide the whole learning task into learning with the large-scale representation models in an unsupervised manner and learning with the small-scale policy model in the RL manner. The small policy model facilitates policy learning, while not sacrificing generalization and expressiveness via the large representation model.