Search Results for author: Yalong Bai

Found 22 papers, 5 papers with code

Dynamic Prompt Optimizing for Text-to-Image Generation

2 code implementations5 Apr 2024 Wenyi Mo, Tianyu Zhang, Yalong Bai, Bing Su, Ji-Rong Wen, Qing Yang

Users assign weights or alter the injection time steps of certain words in the text prompts to improve the quality of generated images.

Text-to-Image Generation

StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models

no code implementations25 Jan 2024 Yalong Bai, Mohan Zhou, Qing Yang

The ability to fine-tune generative models for text-to-image generation tasks is crucial, particularly facing the complexity involved in accurately interpreting and visualizing textual inputs.

Language Modelling Text-to-Image Generation +1

Learning and Evaluating Human Preferences for Conversational Head Generation

no code implementations20 Jul 2023 Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei

In this paper, we propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions.

Interactive Conversational Head Generation

no code implementations5 Jul 2023 Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao

Based on ViCo and ViCo-X, we define three novel tasks targeting the interaction modeling during the face-to-face conversation: 1) responsive listening head generation making listeners respond actively to the speaker with non-verbal signals, 2) expressive talking head generation guiding speakers to be aware of listeners' behaviors, and 3) conversational head generation to integrate the talking/listening ability in one interlocutor.

Sentence Talking Head Generation

Deep Equilibrium Multimodal Fusion

no code implementations29 Jun 2023 Jinhong Ni, Yalong Bai, Wei zhang, Ting Yao, Tao Mei

Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.

Visual Question Answering (VQA)

Visual-Aware Text-to-Speech

no code implementations21 Jun 2023 Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei

Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction.

Speech Synthesis

Learning cross space mapping via DNN using large scale click-through logs

no code implementations26 Feb 2023 Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui

The image and query are mapped to a common vector space via these two parts respectively, and image-query similarity is naturally defined as an inner product of their mappings in the space.

Image Classification Image Retrieval +1

Visualizing and Understanding Patch Interactions in Vision Transformer

no code implementations11 Mar 2022 Jie Ma, Yalong Bai, Bineng Zhong, Wei zhang, Ting Yao, Tao Mei

Vision Transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions.

Freeform Body Motion Generation from Speech

1 code implementation4 Mar 2022 Jing Xu, Wei zhang, Yalong Bai, Qibin Sun, Tao Mei

Motivated by studies in linguistics, we decompose the co-speech motion into two complementary parts: pose modes and rhythmic dynamics.

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

no code implementations27 Dec 2021 Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei

Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots.

Talking Head Generation Translation

Directional Self-supervised Learning for Heavy Image Augmentations

no code implementations CVPR 2022 Yalong Bai, Yifan Yang, Wei zhang, Tao Mei

Specifically, we adapt heavy augmentation policies after the views lightly augmented by standard augmentations, to generate harder view (HV).

Representation Learning Self-Supervised Learning

Augmentation Pathways Network for Visual Recognition

1 code implementation26 Jul 2021 Yalong Bai, Mohan Zhou, Wei zhang, BoWen Zhou, Tao Mei

Experimental results on ImageNet demonstrate the compatibility and effectiveness on a much wider range of augmentations, while consuming fewer parameters and lower computational costs at inference time.

Data Augmentation

Exploiting Relationship for Complex-scene Image Generation

no code implementations1 Apr 2021 Tianyu Hua, Hongdong Zheng, Yalong Bai, Wei zhang, Xiao-Ping Zhang, Tao Mei

Our method tends to synthesize plausible layouts and objects, respecting the interplay of multiple objects in an image.

Image Generation Scene Generation

Products-10K: A Large-scale Product Recognition Dataset

1 code implementation24 Aug 2020 Yalong Bai, Yuxiang Chen, Wei Yu, Linfang Wang, Wei zhang

With the rapid development of electronic commerce, the way of shopping has experienced a revolutionary evolution.

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

2 code implementations CVPR 2020 Mohan Zhou, Yalong Bai, Wei zhang, Tiejun Zhao, Tao Mei

Specifically, we first propose an object-extent learning module for localizing the object according to the visual patterns shared among the instances in the same category.

Fine-Grained Image Classification Image Recognition +7

Relationship-Aware Spatial Perception Fusion for Realistic Scene Layout Generation

no code implementations2 Sep 2019 Hongdong Zheng, Yalong Bai, Wei zhang, Tao Mei

In our framework, a spatial constraint module is designed to fit reasonable scaling and spatial layout of object pairs with considering relationship between them.

Image Generation Object

VrR-VG: Refocusing Visually-Relevant Relationships

no code implementations ICCV 2019 Yuanzhi Liang, Yalong Bai, Wei zhang, Xueming Qian, Li Zhu, Tao Mei

Relationships encode the interactions among individual instances, and play a critical role in deep visual scene understanding.

Image Captioning Question Answering +3

Deep Attention Neural Tensor Network for Visual Question Answering

no code implementations ECCV 2018 Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei

First, we model one of the pairwise interaction (e. g., image and question) by bilinear features, which is further encoded with the third dimension (e. g., answer) to be a triplet by bilinear tensor product.

Deep Attention Question Answering +1

Automatic Dataset Augmentation

no code implementations28 Aug 2017 Yalong Bai, Kuiyuan Yang, Tao Mei, Wei-Ying Ma, Tiejun Zhao

Large scale image dataset and deep convolutional neural network (DCNN) are two primary driving forces for the rapid progress made in generic object recognition tasks in recent years.

Object Recognition

Visualizing and Comparing Convolutional Neural Networks

no code implementations20 Dec 2014 Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui

Convolutional Neural Networks (CNNs) have achieved comparable error rates to well-trained human on ILSVRC2014 image classification task.

Classification General Classification +1

Learning High-level Image Representation for Image Retrieval via Multi-Task DNN using Clickthrough Data

no code implementations17 Dec 2013 Yalong Bai, Kuiyuan Yang, Wei Yu, Wei-Ying Ma, Tiejun Zhao

Image retrieval refers to finding relevant images from an image database for a query, which is considered difficult for the gap between low-level representation of images and high-level representation of queries.

Image Retrieval Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.