no code implementations • 4 Feb 2024 • Ziyu Ma, Shutao Li, Bin Sun, Jianfei Cai, Zuxiang Long, Fuyan Ma
Therefore, we propose GeReA, a generate-reason framework that prompts a MLLM like InstructBLIP with question relevant vision and language information to generate knowledge-relevant descriptions and reasons those descriptions for knowledge-based VQA.
no code implementations • 5 Jul 2022 • Bin Li, Yixuan Weng, Ziyu Ma, Bin Sun, Shutao Li
To fully leverage the visual information for both scene understanding and dialogue generation, we propose the scene-aware prompt for the MDUG task.
no code implementations • 16 Oct 2021 • Ziyu Ma, Fuyan Ma, Bin Sun, Shutao Li
For the MuSe-Stress sub-challenge, we highlight our solutions in three aspects: 1) the audio-visual features and the bio-signal features are used for emotional state recognition.