Search Results for author: Jiali Yao

Found 8 papers, 4 papers with code

High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion

no code implementations17 Apr 2025 Libo Zhang, Yongsheng Yu, Jiali Yao, Heng Fan

Despite excellence, they ignore a hard constraint that the unmasked regions in the input and the output should be the same, resulting in a gap between GAN inversion and image inpainting and thus degrading the performance.

Generative Adversarial Network Image Inpainting +1

OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding

1 code implementation13 Mar 2025 Jiali Yao, Xinran Deng, Xin Gu, Mengrui Dai, Bing Fan, Zhipeng Zhang, Yan Huang, Heng Fan, Libo Zhang

Each sequence in BOSTVG, paired with a free-form textual query, encompasses a varying number of targets ranging from 1 to 10.

Object Video Grounding

Beyond MOT: Semantic Multi-Object Tracking

1 code implementation8 Mar 2024 Yunhao Li, Qin Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan, Libo Zhang

Current multi-object tracking (MOT) aims to predict trajectories of targets (i. e., ''where'') in videos.

Multi-Object Tracking Object +1

Towards Realistic Visual Dubbing with Heterogeneous Sources

no code implementations17 Jan 2022 Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma

The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video.

Disentanglement Talking Head Generation

Improving RNN transducer with normalized jointer network

no code implementations3 Nov 2020 Mingkun Huang, Jun Zhang, Meng Cai, Yang Zhang, Jiali Yao, Yongbin You, Yi He, Zejun Ma

In this work, we analyze the cause of the huge gradient variance in RNN-T training and proposed a new \textit{normalized jointer network} to overcome it.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Real-time Neural-based Input Method

no code implementations ICLR 2019 Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama

The input method is an essential service on every mobile and desktop devices that provides text suggestions.

Language Modeling Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.