The guidance of the rendered image is realized by blending features from two branches effectively in the training of the detail branch, which improves both the warping accuracy and the details' fidelity.
For unsupervised image-to-image translation, we propose a discriminator architecture which focuses on the statistical features instead of individual patches.
Expressivity plays a fundamental role in evaluating deep neural networks, and it is closely related to understanding the limit of performance improvement.
no code implementations • 29 Dec 2020 • Xiu-Shen Wei, Yu-Yan Xu, Yazhou Yao, Jia Wei, Si Xi, Wenyuan Xu, Weidong Zhang, Xiaoxin Lv, Dengpan Fu, Qing Li, Baoying Chen, Haojie Guo, Taolue Xue, Haipeng Jing, Zhiheng Wang, Tianming Zhang, Mingwen Zhang
WebFG 2020 is an international challenge hosted by Nanjing University of Science and Technology, University of Edinburgh, Nanjing University, The University of Adelaide, Waseda University, etc.
In reinforcement learning (RL), we always expect the agent to explore as many states as possible in the initial stage of training and exploit the explored information in the subsequent stage to discover the most returnable trajectory.
The task of room layout estimation is to locate the wall-floor, wall-ceiling, and wall-wall boundaries.
Dialogue systems in open domain have achieved great success due to the easily obtained single-turn corpus and the development of deep learning, but the multi-turn scenario is still a challenge because of the frequent coreference and information omission.
Ranked #1 on Dialogue Rewriting on Multi-Rewrite (BLEU-1 metric)
This linear model is then utilized to reduce the redundant information in the left and right road images.
In this paper, we provide a quantitative analysis of the expressivity for the deep neural network (DNN) from its dynamic model, where the Hilbert space is employed to analyze the convergence and criticality.
More specifically, we present an encoder-decoder network with shared encoder and two separate decoders, which are composed of multiple deconvolution (transposed convolution) layers, to jointly learn the edge maps and semantic labels of a room image.