Search Results for author: Pan Lu

Found 35 papers, 21 papers with code

Towards Socially Intelligent Agents with Mental State Transition and Human Value

no code implementations SIGDIAL (ACL) 2022 Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu

One of which is to track the agent’s mental state transition and teach the agent to make decisions guided by its value like a human.

VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

1 code implementation19 Jun 2024 Xueqing Wu, Zongyu Lin, Songyan Zhao, Te-Lin Wu, Pan Lu, Nanyun Peng, Kai-Wei Chang

Visual programs are executable code generated by large language models to address visual reasoning problems.

Visual Reasoning

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

no code implementations30 May 2024 Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data and append its self-generated image descriptions to the prompts.

Image Comprehension Visual Question Answering

Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

1 code implementation27 Feb 2024 Xiao Liu, Zirui Wu, Xueqing Wu, Pan Lu, Kai-Wei Chang, Yansong Feng

To address this gap, we introduce the Quantitative Reasoning with Data (QRData) benchmark, aiming to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data.


Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue

1 code implementation9 Jan 2024 Jia-Chen Gu, Hao-Xiang Xu, Jun-Yu Ma, Pan Lu, Zhen-Hua Ling, Kai-Wei Chang, Nanyun Peng

While current model editing methods can effectively modify a model's behavior within a specific area of interest, they often overlook the potential unintended side effects on the general abilities of LLMs such as reasoning, natural language inference, and question answering.

Model Editing Natural Language Inference +1

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

1 code implementation20 Jul 2023 Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang

Most of the existing Large Language Model (LLM) benchmarks on scientific problem reasoning focus on problems grounded in high-school subjects and are confined to elementary algebraic operations.

Benchmarking Language Modelling +2

TheoremQA: A Theorem-driven Question Answering dataset

1 code implementation21 May 2023 Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, Tony Xia

We evaluate a wide spectrum of 16 large language and code models with different prompting strategies like Chain-of-Thoughts and Program-of-Thoughts.

Math Question Answering

Multimodal Procedural Planning via Dual Text-Image Prompting

1 code implementation2 May 2023 Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang

The key challenges of MPP are to ensure the informativeness, temporal coherence, and accuracy of plans across modalities.

Informativeness Text-to-Image Generation

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

3 code implementations28 Apr 2023 Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao

This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset.

Instruction Following Optical Character Recognition (OCR) +7

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

1 code implementation NeurIPS 2023 Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao

At the heart of Chameleon is an LLM-based planner that assembles a sequence of tools to execute to generate the final response.

Logical Reasoning

A Survey of Deep Learning for Mathematical Reasoning

1 code implementation20 Dec 2022 Pan Lu, Liang Qiu, Wenhao Yu, Sean Welleck, Kai-Wei Chang

Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life.

Math Mathematical Reasoning

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

2 code implementations6 Dec 2022 Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, Xiaodan Liang

Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation.

Geometry Problem Solving Logical Reasoning +1

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

2 code implementations29 Sep 2022 Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan

However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.

Logical Reasoning Math +1

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

1 code implementation20 Sep 2022 Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan

We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.

Multimodal Deep Learning Multimodal Reasoning +5

Triangular Character Animation Sampling with Motion, Emotion, and Relation

no code implementations9 Mar 2022 Yizhou Zhao, Liang Qiu, Wensi Ai, Pan Lu, Song-Chun Zhu

We propose a Spatial-Temporal And-Or graph (ST-AOG), a stochastic grammar model, to encode the contextual relationship between motion, emotion, and relation, forming a triangle in a conditional random field.


ValueNet: A New Dataset for Human Value Driven Dialogue System

no code implementations12 Dec 2021 Liang Qiu, Yizhou Zhao, Jinchao Li, Pan Lu, Baolin Peng, Jianfeng Gao, Song-Chun Zhu

To the best of our knowledge, ValueNet is the first large-scale text dataset for human value modeling, and we are the first one trying to incorporate a value model into emotionally intelligent dialogue systems.

Dialogue Generation Emotion Recognition +2

Learning from the Tangram to Solve Mini Visual Tasks

1 code implementation12 Dec 2021 Yizhou Zhao, Liang Qiu, Pan Lu, Feng Shi, Tian Han, Song-Chun Zhu

Current pre-training methods in computer vision focus on natural images in the daily-life context.

Few-Shot Learning

IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning

1 code implementation25 Oct 2021 Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu

Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.

Arithmetic Reasoning Math Word Problem Solving +2

Towards Socially Intelligent Agents with Mental State Transition and Human Utility

no code implementations12 Mar 2021 Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu

One of which is to track the agent's mental state transition and teach the agent to make decisions guided by its value like a human.

Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption

no code implementations AAAI Conference on Artificial Intelligence (AAAI 2020) 2020 Wei Zhang, Yue Ying, Pan Lu, Hongyuan Zha

Personalized image caption, a natural extension of the standard image caption task, requires to generate brief image descriptions tailored for users’ writing style and traits, and is more practical to meet users’ real demands.

Image Captioning

Knowledge Aware Semantic Concept Expansion for Image-Text Matching

no code implementations International Joint Conferences on Artifical Intelligence (IJCAI) 2019 Botian Shi, Lei Ji, Pan Lu, Zhendong Niu, Nan Duan

In this paper, we develop a Scene Concept Graph (SCG) by aggregating image scene graphs and extracting frequently co-occurred concept pairs as scene common-sense knowledge.

Common Sense Reasoning Content-Based Image Retrieval +3

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

no code implementations13 Dec 2018 Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li

It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.

Question Answering Visual Question Answering

Question-Guided Hybrid Convolution for Visual Question Answering

no code implementations ECCV 2018 Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang

Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.

Question Answering Visual Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.