no code implementations • 27 Jun 2023 • Yuhao Cui, Xiongwei Wang, Zhongzhou Zhao, Wei Zhou, Haiqing Chen
However, these high-level semantic probabilities are often inaccurate and unsmooth at the phoneme level, leading to bias in learning.
1 code implementation • ICCV 2023 • Yanzhao Zheng, Yunzhou Shi, Yuhao Cui, Zhongzhou Zhao, Zhiling Luo, Wei Zhou
To address this issue, we propose a novel framework called COOP (DeCOupling and COupling of Whole-Body GrasPing Pose Generation) to synthesize life-like whole-body poses that cover the widest range of human grasping capabilities.
1 code implementation • 16 Aug 2021 • Yuhao Cui, Zhou Yu, Chunqi Wang, Zhongzhou Zhao, Ji Zhang, Meng Wang, Jun Yu
Nevertheless, most existing VLP approaches have not fully utilized the intrinsic knowledge within the image-text pairs, which limits the effectiveness of the learned alignments and further restricts the performance of their models.
1 code implementation • 25 Apr 2020 • Zhou Yu, Yuhao Cui, Jun Yu, Meng Wang, DaCheng Tao, Qi Tian
Most existing works focus on a single task and design neural architectures manually, which are highly task-specific and hard to generalize to different tasks.
Ranked #19 on Visual Question Answering (VQA) on VQA v2 test-std
no code implementations • 12 Aug 2019 • Zhou Yu, Yuhao Cui, Jun Yu, DaCheng Tao, Qi Tian
Learning an effective attention mechanism for multimodal data is important in many vision-and-language tasks that require a synergic understanding of both the visual and textual contents.
7 code implementations • CVPR 2019 • Zhou Yu, Jun Yu, Yuhao Cui, DaCheng Tao, Qi Tian
In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth.
Ranked #5 on Question Answering on SQA3D