1 code implementation • CVPR 2022 • Yuanzhi Liang, Qianyu Feng, Linchao Zhu, Li Hu, Pan Pan, Yi Yang
Talking gesture generation is a practical yet challenging task which aims to synthesize gestures in line with speech.
Ranked #6 on Gesture Generation on TED Gesture Dataset
no code implementations • 1 Jun 2021 • Qianyu Feng, Bang Zhang, Yi Yang
Differently, our goal is to represent a system with a part-whole hierarchy and discover the implied dependencies among intra-system variables: inferring the interactions that possess causal effects on the sub-system behavior with REcurrent partItioned Network (REIN).
no code implementations • 2 May 2021 • Qianyu Feng, Linchao Zhu, Bang Zhang, Pan Pan, Yi Yang
Specifically, we expect to approximate the real joint distribution over the partial observation and latent variables, thus infer the unseen targets respectively.
no code implementations • 18 Mar 2021 • Qianyu Feng, Yunchao Wei, MingMing Cheng, Yi Yang
Visual grounding is a long-lasting problem in vision-language understanding due to its diversity and complexity.
no code implementations • 8 Mar 2021 • Qianyu Feng, Yawei Luo, Keyang Luo, Yi Yang
To generalize the model towards a real scenario, we propose to fulfill several aspects: (1) Look: visually incorporate spatial structure from the single view to enhance the expressiveness of representation; (2) Cast: perceptually align the 2D image features to the 3D shape priors with cross-modal semantic contrastive mapping; (3) Mold: reconstruct stereo-shape of target by transforming embeddings into the desired manifold.
1 code implementation • 6 Aug 2019 • Qianyu Feng, Yu Wu, Hehe Fan, Chenggang Yan, Yi Yang
By this novel cascaded captioning-revising mechanism, CRN can accurately describe images with unseen objects.
1 code implementation • ICCV 2019 • Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang
In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source.