no code implementations • 25 Aug 2024 • Liangyu Chen, Zihao Yue, Boshen Xu, Qin Jin
Audio-Visual Source Localization (AVSL) aims to localize the source of sound within a video.
1 code implementation • 28 May 2024 • Boshen Xu, Ziheng Wang, Yang Du, Zhinan Song, Sipeng Zheng, Qin Jin
Due to the occurrence of diverse EgoHOIs in the real world, we propose an open-vocabulary benchmark named EgoHOIBench to reveal the diminished performance of current egocentric video-language models (EgoVLM) on fined-grained concepts, indicating that these models still lack a full spectrum of egocentric understanding.
1 code implementation • 9 Mar 2024 • Boshen Xu, Sipeng Zheng, Qin Jin
We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task.
1 code implementation • 9 Mar 2024 • Boshen Xu, Sipeng Zheng, Qin Jin
We humans are good at translating third-person observations of hand-object interactions (HOI) into an egocentric view.
no code implementations • CVPR 2023 • Sipeng Zheng, Boshen Xu, Qin Jin
Human-object interaction (HOI) has long been plagued by the conflict between limited supervised data and a vast number of possible interaction combinations in real life.
no code implementations • 17 May 2021 • Andrey Ignatov, Andres Romero, Heewon Kim, Radu Timofte, Chiu Man Ho, Zibo Meng, Kyoung Mu Lee, Yuxiang Chen, Yutong Wang, Zeyu Long, Chenhao Wang, Yifei Chen, Boshen Xu, Shuhang Gu, Lixin Duan, Wen Li, Wang Bofei, Zhang Diankai, Zheng Chengjian, Liu Shaoli, Gao Si, Zhang Xiaofeng, Lu Kaidi, Xu Tianyu, Zheng Hui, Xinbo Gao, Xiumei Wang, Jiaming Guo, Xueyi Zhou, Hao Jia, Youliang Yan
Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services.