no code implementations • 22 Apr 2024 • Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao, Meng Jiang
Open-source multimodal large language models (MLLMs) excel in various tasks involving textual and visual inputs but still struggle with complex multimodal mathematical reasoning, lagging behind proprietary models like GPT-4V(ision) and Gemini-Pro.
no code implementations • 16 Dec 2023 • Mengzhao Jia, Can Xie, Liqiang Jing
Moreover, we propose a novel debiasing multimodal sarcasm detection framework with contrastive learning, which aims to mitigate the harmful effect of biased textual factors for robust OOD generalization.
1 code implementation • 15 Nov 2023 • Zhihan Zhang, Dong-Ho Lee, Yuwei Fang, Wenhao Yu, Mengzhao Jia, Meng Jiang, Francesco Barbieri
Instruction tuning has remarkably advanced large language models (LLMs) in understanding and responding to diverse human instructions.
1 code implementation • 2 Nov 2023 • Liqiang Jing, Ruosen Li, Yunmo Chen, Mengzhao Jia, Xinya Du
We introduce FAITHSCORE (Faithfulness to Atomic Image Facts Score), a reference-free and fine-grained evaluation metric that measures the faithfulness of the generated free-form answers from large vision-language models (LVLMs).
no code implementations • 11 Oct 2023 • Mengzhao Jia, Qianglong Chen, Liqiang Jing, Dawei Fu, Renyu Li
The prevalence of mental disorders has become a significant issue, leading to the increased focus on Emotional Support Conversation as an effective supplement for mental health support.
1 code implementation • 29 Jun 2023 • Liqiang Jing, Xuemeng Song, Kun Ouyang, Mengzhao Jia, Liqiang Nie
Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate a natural language sentence for a multimodal social post (an image as well as its caption) to explain why it contains sarcasm.