no code implementations • 15 Dec 2024 • Bohan Li, Xin Jin, Jianan Wang, Yukai Shi, Yasheng Sun, XiaoFeng Wang, Zhuang Ma, Baao Xie, Chao Ma, Xiaokang Yang, Wenjun Zeng
Within OccScene, the perception module can be effectively improved with customized and diverse generated scenes, while the perception priors in return enhance the generation performance for mutual benefits.
no code implementations • 11 Dec 2024 • Bohan Li, Xin Jin, Jiajun Deng, Yasheng Sun, XiaoFeng Wang, Wenjun Zeng
Camera-based 3D Semantic Occupancy Prediction (SOP) is crucial for understanding complex 3D scenes from limited 2D image observations.
3D Semantic Occupancy Prediction LIDAR Semantic Segmentation +2
no code implementations • 25 Feb 2024 • Yasheng Sun, Wenqing Chu, Hang Zhou, Kaisiyuan Wang, Hideki Koike
In this paper, we propose AVI-Talking, an Audio-Visual Instruction system for expressive Talking face generation.
no code implementations • 22 Jun 2023 • Bohan Li, Yasheng Sun, Jingxin Dong, Zheng Zhu, Jinming Liu, Xin Jin, Wenjun Zeng
Numerous studies have investigated the pivotal role of reliable 3D volume representation in scene perception tasks, such as multi-view stereo (MVS) and semantic scene completion (SSC).
1 code implementation • 24 Mar 2023 • Bohan Li, Yasheng Sun, Zhujin Liang, Dalong Du, Zhuanghui Zhang, XiaoFeng Wang, Yunnan Wang, Xin Jin, Wenjun Zeng
However, due to the inherent representation gap between stereo geometry and BEV features, it is non-trivial to bridge them for dense prediction task of SSC.
no code implementations • 14 Feb 2023 • Yasheng Sun, Qianyi Wu, Hang Zhou, Kaisiyuan Wang, Tianshu Hu, Chen-Chieh Liao, Shio Miyafuji, Ziwei Liu, Hideki Koike
Creating the photo-realistic version of people sketched portraits is useful to various entertainment purposes.
no code implementations • 9 Dec 2022 • Yasheng Sun, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Zhibin Hong, Jingtuo Liu, Errui Ding, Jingdong Wang, Ziwei Liu, Hideki Koike
This requires masking a large percentage of the original image and seamlessly inpainting it with the aid of audio and reference frames.
1 code implementation • CVPR 2021 • Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu
While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.