1 code implementation • 21 Sep 2023 • Taeho Kang, Kyungjin Lee, Jinrui Zhang, Youngki Lee
We propose a new perspective-aware representation using trigonometry, enabling the network to estimate the 3D orientation of limbs.
1 code implementation • ICCV 2023 • Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng
In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios.
1 code implementation • 17 Jun 2023 • Yunlong Tang, Jinrui Zhang, Xiangchen Wang, Teng Wang, Feng Zheng
This paper proposes an effective model LLMVA-GEBC (Large Language Model with Video Adapter for Generic Event Boundary Captioning): (1) We utilize a pretrained LLM for generating human-like captions with high quality.
1 code implementation • 4 May 2023 • Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao
Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.
1 code implementation • 11 Mar 2023 • Teng Wang, Jinrui Zhang, Feng Zheng, Wenhao Jiang, Ran Cheng, Ping Luo
Our framework is easily extensible to tasks covering visually-grounded language understanding and generation.
1 code implementation • 3 Jul 2022 • Jinrui Zhang, Teng Wang, Feng Zheng, Ran Cheng, Ping Luo
Previous methods only process the information of a single boundary at a time, which lacks utilization of video context information.