2 code implementations • 5 Apr 2024 • Wenyi Mo, Tianyu Zhang, Yalong Bai, Bing Su, Ji-Rong Wen, Qing Yang
Users assign weights or alter the injection time steps of certain words in the text prompts to improve the quality of generated images.
no code implementations • 25 Jan 2024 • Yalong Bai, Mohan Zhou, Qing Yang
The ability to fine-tune generative models for text-to-image generation tasks is crucial, particularly facing the complexity involved in accurately interpreting and visualizing textual inputs.
no code implementations • 20 Jul 2023 • Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei
In this paper, we propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions.
no code implementations • 5 Jul 2023 • Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao
Based on ViCo and ViCo-X, we define three novel tasks targeting the interaction modeling during the face-to-face conversation: 1) responsive listening head generation making listeners respond actively to the speaker with non-verbal signals, 2) expressive talking head generation guiding speakers to be aware of listeners' behaviors, and 3) conversational head generation to integrate the talking/listening ability in one interlocutor.
no code implementations • 29 Jun 2023 • Jinhong Ni, Yalong Bai, Wei zhang, Ting Yao, Tao Mei
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
no code implementations • 21 Jun 2023 • Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei
Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction.
no code implementations • 26 Feb 2023 • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui
The image and query are mapped to a common vector space via these two parts respectively, and image-query similarity is naturally defined as an inner product of their mappings in the space.
no code implementations • 11 Mar 2022 • Jie Ma, Yalong Bai, Bineng Zhong, Wei zhang, Ting Yao, Tao Mei
Vision Transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions.
1 code implementation • 4 Mar 2022 • Jing Xu, Wei zhang, Yalong Bai, Qibin Sun, Tao Mei
Motivated by studies in linguistics, we decompose the co-speech motion into two complementary parts: pose modes and rhythmic dynamics.
no code implementations • 27 Dec 2021 • Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei
Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots.
no code implementations • CVPR 2022 • Yalong Bai, Yifan Yang, Wei zhang, Tao Mei
Specifically, we adapt heavy augmentation policies after the views lightly augmented by standard augmentations, to generate harder view (HV).
1 code implementation • 26 Jul 2021 • Yalong Bai, Mohan Zhou, Wei zhang, BoWen Zhou, Tao Mei
Experimental results on ImageNet demonstrate the compatibility and effectiveness on a much wider range of augmentations, while consuming fewer parameters and lower computational costs at inference time.
no code implementations • 1 Apr 2021 • Tianyu Hua, Hongdong Zheng, Yalong Bai, Wei zhang, Xiao-Ping Zhang, Tao Mei
Our method tends to synthesize plausible layouts and objects, respecting the interplay of multiple objects in an image.
1 code implementation • 24 Aug 2020 • Yalong Bai, Yuxiang Chen, Wei Yu, Linfang Wang, Wei zhang
With the rapid development of electronic commerce, the way of shopping has experienced a revolutionary evolution.
2 code implementations • CVPR 2020 • Mohan Zhou, Yalong Bai, Wei zhang, Tiejun Zhao, Tao Mei
Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category.
Ranked #17 on Fine-Grained Image Classification on CUB-200-2011
no code implementations • 2 Sep 2019 • Hongdong Zheng, Yalong Bai, Wei zhang, Tao Mei
In our framework, a spatial constraint module is designed to fit reasonable scaling and spatial layout of object pairs with considering relationship between them.
no code implementations • ICCV 2019 • Yuanzhi Liang, Yalong Bai, Wei zhang, Xueming Qian, Li Zhu, Tao Mei
Relationships encode the interactions among individual instances, and play a critical role in deep visual scene understanding.
no code implementations • ECCV 2018 • Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei
First, we model one of the pairwise interaction (e. g., image and question) by bilinear features, which is further encoded with the third dimension (e. g., answer) to be a triplet by bilinear tensor product.
no code implementations • 28 Aug 2017 • Yalong Bai, Kuiyuan Yang, Tao Mei, Wei-Ying Ma, Tiejun Zhao
Large scale image dataset and deep convolutional neural network (DCNN) are two primary driving forces for the rapid progress made in generic object recognition tasks in recent years.
no code implementations • 20 Dec 2014 • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui
Convolutional Neural Networks (CNNs) have achieved comparable error rates to well-trained human on ILSVRC2014 image classification task.
no code implementations • 17 Dec 2013 • Yalong Bai, Kuiyuan Yang, Wei Yu, Wei-Ying Ma, Tiejun Zhao
Image retrieval refers to finding relevant images from an image database for a query, which is considered difficult for the gap between low-level representation of images and high-level representation of queries.