Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation.
For multi-layer linear networks with vector outputs, we formulate convex dual problems and demonstrate that the duality gap is non-zero for depth three and deeper networks.
We then show that the limit points of non-convex subgradient flows can be identified via primal-dual correspondence in this convex optimization problem.
Since late 1960s, there have been numerous successes in the exciting new frontier of asymmetric catalysis.
Our first contribution is to show that, at each iteration, the embedding dimension (or sketch size) can be as small as the effective dimension of the Hessian matrix.
We propose a projected Wasserstein gradient descent method (pWGD) for high-dimensional Bayesian inference problems.
Disparate impact has raised serious concerns in machine learning applications and its societal impacts.
Our bidirectional dynamic fusion strategy encourages the interaction of spatial and temporal information in a dynamic manner.
Recently, sampling methods have been successfully applied to enhance the sample quality of Generative Adversarial Networks (GANs).
In this paper we propose a unified approach for supporting different generation manners of machine translation, including autoregressive, semi-autoregressive, and refinement-based non-autoregressive models.
no code implementations • 5 Aug 2020 • Yijiang Lian, Zhijie Chen, Xin Pei, Shuang Li, Yifei Wang, Yuefeng Qiu, Zhiheng Zhang, Zhipeng Tao, Liang Yuan, Hanju Guan, Kefeng Zhang, Zhigang Li, Xiaochun Liu
Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking.
Adversarial Training (AT) is proposed to alleviate the adversarial vulnerability of machine learning models by extracting only robust features from the input, which, however, inevitably leads to severe accuracy reduction as it discards the non-robust yet useful features.
As additional consequences of our convex perspective, (i) we establish that Clarke stationary points found by stochastic gradient descent correspond to the global optimum of a subsampled convex problem (ii) we provide a polynomial-time algorithm for checking if a neural network is a global minimum of the training loss (iii) we provide an explicit construction of a continuous path between any neural network and the global minimum of its sublevel set and (iv) characterize the minimal size of the hidden layer so that the neural network optimization landscape has no spurious valleys.
Our work builds on the recently proposed Deep CORAL method, which proposed to train a convolutional neural network and simultaneously minimize the Euclidean distance of convariance matrices between the source and target domains.