Search Results for author: Jinsong Zhang

Found 21 papers, 8 papers with code

SpeechAct: Towards Generating Whole-body Motion from Speech

no code implementations29 Nov 2023 Jinsong Zhang, Minjie Zhu, Yuxiang Zhang, Yebin Liu, Kun Li

Then, we regress the motion representation from the audio signal by a translation model employing our contrastive motion learning method.

High-Quality Animatable Dynamic Garment Reconstruction from Monocular Videos

no code implementations2 Nov 2023 Xiongzheng Li, Jinsong Zhang, Yu-Kun Lai, Jingyu Yang, Kun Li

To alleviate the ambiguity estimating 3D garments from monocular videos, we design a multi-hypothesis deformation module that learns spatial representations of multiple plausible deformations.

Garment Reconstruction

Towards Grouping in Large Scenes with Occlusion-aware Spatio-temporal Transformers

no code implementations30 Oct 2023 Jinsong Zhang, Lingfeng Gu, Yu-Kun Lai, Xueyang Wang, Kun Li

To explore the potential spatio-temporal relationship, we propose spatio-temporal transformers to simultaneously extract trajectory information and fuse inter-person features in a hierarchical manner.

Narrator: Towards Natural Control of Human-Scene Interaction Generation via Relationship Reasoning

no code implementations ICCV 2023 Haibiao Xuan, Xiongzheng Li, Jinsong Zhang, Hongwen Zhang, Yebin Liu, Kun Li

Also, we model global and local spatial relationships in a 3D scene and a textual description respectively based on the scene graph, and introduce a partlevel action mechanism to represent interactions as atomic body part states.

Text-Aware End-to-end Mispronunciation Detection and Diagnosis

1 code implementation15 Jun 2022 Linkai Peng, Yingming Gao, Binghuai Lin, Dengfeng Ke, Yanlu Xie, Jinsong Zhang

In the field of assessing the pronunciation quality of constrained speech, the given transcriptions can play the role of a teacher.

Contrastive Learning

High-Fidelity Human Avatars From a Single RGB Camera

no code implementations CVPR 2022 Hao Zhao, Jinsong Zhang, Yu-Kun Lai, Zerong Zheng, Yingdi Xie, Yebin Liu, Kun Li

To cope with the complexity of textures and generate photo-realistic results, we propose a reference-based neural rendering network and exploit a bottom-up sharpening-guided fine-tuning strategy to obtain detailed textures.

Neural Rendering Vocal Bursts Intensity Prediction

Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers

1 code implementation7 Jul 2021 Huahuan Zheng, Wenjie Peng, Zhijian Ou, Jinsong Zhang

Automatic speech recognition systems have been largely improved in the past few decades and current systems are mainly hybrid-based and end-to-end-based.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

no code implementations6 May 2021 Dengfeng Ke, Jinsong Zhang, Yanlu Xie, Yanyan Xu, Binghuai Lin

With all these modifications, the size of the PHASEN model is shrunk from 33M parameters to 5M parameters, while the performance on VoiceBank+DEMAND is improved to the CSIG score of 4. 30, the PESQ score of 3. 07 and the COVL score of 3. 73.

Speech Enhancement

A Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augmentation Techniques

1 code implementation17 Apr 2021 Kaiqi Fu, Jones Lin, Dengfeng Ke, Yanlu Xie, Jinsong Zhang, Binghuai Lin

Recently, end-to-end mispronunciation detection and diagnosis (MD&D) systems has become a popular alternative to greatly simplify the model-building process of conventional hybrid DNN-HMM systems by representing complicated modules with a single deep network architecture.

Data Augmentation

PISE: Person Image Synthesis and Editing with Decoupled GAN

1 code implementation CVPR 2021 Jinsong Zhang, Kun Li, Yu-Kun Lai, Jingyu Yang

The results of qualitative and quantitative experiments demonstrate the superiority of our model on human pose transfer.

Human Parsing Pose Transfer

Human Pose Transfer by Adaptive Hierarchical Deformation

1 code implementation13 Dec 2020 Jinsong Zhang, Xingzi Liu, Kun Li

Existing methods cannot effectively utilize the input information, which often fail to preserve the style and shape of hair and clothes.

Pose Transfer Semantic Parsing +1

PoNA: Pose-guided Non-local Attention for Human Pose Transfer

1 code implementation13 Dec 2020 Kun Li, Jinsong Zhang, Yebin Liu, Yu-Kun Lai, Qionghai Dai

In each block, we propose a pose-guided non-local attention (PoNA) mechanism with a long-range dependency scheme to select more important regions of image features to transfer.

Generative Adversarial Network Person Re-Identification +1

Adaptive 3D Face Reconstruction from a Single Image

no code implementations8 Jul 2020 Kun Li, Jing Yang, Nianhong Jiao, Jinsong Zhang, Yu-Kun Lai

3D face reconstruction from a single image is a challenging problem, especially under partial occlusions and extreme poses.

3D Face Reconstruction Pose Estimation

All-Weather Deep Outdoor Lighting Estimation

no code implementations CVPR 2019 Jinsong Zhang, Kalyan Sunkavalli, Yannick Hold-Geoffroy, Sunil Hadap, Jonathan Eisenmann, Jean-François Lalonde

We use this network to label a large-scale dataset of LDR panoramas with lighting parameters and use them to train our single image outdoor lighting estimation network.

Lighting Estimation

Deep Photovoltaic Nowcasting

no code implementations15 Oct 2018 Jinsong Zhang, Rodrigo Verschae, Shohei Nobuhara, Jean-François Lalonde

Our experiments reveal that the MLP network, already used similarly in previous work, achieves an RMSE skill score of 7% over the commonly-used persistence baseline on the 1-minute future photovoltaic power prediction task.

Management

Phoneme Set Design Using English Speech Database by Japanese for Dialogue-Based English CALL Systems

no code implementations LREC 2014 Xiaoyun Wang, Jinsong Zhang, Masafumi Nishida, Seiichi Yamamoto

This paper describes a method of generating a reduced phoneme set for dialogue-based computer assisted language learning (CALL)systems.

Language Modelling Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.