Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image.
The fashion industry has diverse applications in multi-modal image generation and editing.
By contrast, pixel-level optimization is more explicit, however, it is sensitive to the visual quality of training data and is not robust to object deformation.
With user history represented by a domain-aware sequential model, a frequency encoder is applied to the underlying tags for user content preference learning.
no code implementations • 1 Mar 2021 • Junyang Lin, Rui Men, An Yang, Chang Zhou, Ming Ding, Yichang Zhang, Peng Wang, Ang Wang, Le Jiang, Xianyan Jia, Jie Zhang, Jianwei Zhang, Xu Zou, Zhikang Li, Xiaodong Deng, Jie Liu, Jinbao Xue, Huiling Zhou, Jianxin Ma, Jin Yu, Yong Li, Wei Lin, Jingren Zhou, Jie Tang, Hongxia Yang
In this work, we construct the largest dataset for multimodal pretraining in Chinese, which consists of over 1. 9TB images and 292GB texts that cover a wide range of domains.
The majority of current MTL studies adopt the hard parameter sharing structure, where hard layers tend to learn general representations over all tasks and specific layers are prone to learn specific representations for each task.
In e-commerce system, category prediction is to automatically predict categories of given texts.