Multimodal Chain-of-Thought Reasoning in Language Models

Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer.

Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

Despite advancements in LLMs, knowledge-based reasoning remains a longstanding issue due to the fragility of knowledge recall and inference.

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal conversations.

Unification-based Reconstruction of Multi-hop Explanations for Science Questions

This paper presents a novel framework for reconstructing multi-hop explanations in science Question Answering (QA).

Dynamic Semantic Graph Construction and Reasoning for Explainable Multi-hop Science Question Answering

Our framework contains three new ideas: (a) {\tt AMR-SG}, an AMR-based Semantic Graph, constructed by candidate fact AMRs to uncover any hop relations among question, answer and multiple facts.

Exploiting Reasoning Chains for Multi-hop Science Question Answering

We propose a novel Chain Guided Retriever-reader ({\tt CGR}) framework to model the reasoning chain for multi-hop Science Question Answering.

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.

Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering

We show the efficacy of our proposed approach in different tasks -- abductive reasoning, commonsense question answering, science question answering, and sentence completion.

T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering

To address these issues, we propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals.

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models

To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN.