no code implementations • SIGDIAL (ACL) 2022 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu
One of which is to track the agent’s mental state transition and teach the agent to make decisions guided by its value like a human.
1 code implementation • 21 May 2023 • Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, Tony Xia
We evaluate a wide spectrum of 16 large language and code models with different prompting strategies like Chain-of-Thoughts and Program-of-Thoughts.
1 code implementation • 2 May 2023 • Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang
The key challenges of MPP are to ensure the informativeness, temporal coherence, and accuracy of plans across modalities.
2 code implementations • 28 Apr 2023 • Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao
This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset.
1 code implementation • 19 Apr 2023 • Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao
Chameleon synthesizes programs by composing various tools (e. g., LLMs, off-the-shelf vision models, web search engines, Python functions, and heuristic-based modules) for accomplishing complex reasoning tasks.
2 code implementations • 28 Mar 2023 • Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao
We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model.
1 code implementation • 20 Dec 2022 • Pan Lu, Liang Qiu, Wenhao Yu, Sean Welleck, Kai-Wei Chang
Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life.
2 code implementations • 6 Dec 2022 • Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, Xiaodan Liang
Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation.
Ranked #3 on
Mathematical Reasoning
on PGPS9K
1 code implementation • 31 Oct 2022 • Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan
Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling.
Ranked #1 on
Mathematical Reasoning
on Lila (OOD)
1 code implementation • 29 Sep 2022 • Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan
However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data.
1 code implementation • 20 Sep 2022 • Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan
We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.
Ranked #3 on
Science Question Answering
on ScienceQA
no code implementations • 9 Mar 2022 • Yizhou Zhao, Liang Qiu, Wensi Ai, Pan Lu, Song-Chun Zhu
We propose a Spatial-Temporal And-Or graph (ST-AOG), a stochastic grammar model, to encode the contextual relationship between motion, emotion, and relation, forming a triangle in a conditional random field.
no code implementations • 12 Dec 2021 • Liang Qiu, Yizhou Zhao, Jinchao Li, Pan Lu, Baolin Peng, Jianfeng Gao, Song-Chun Zhu
To the best of our knowledge, ValueNet is the first large-scale text dataset for human value modeling, and we are the first one trying to incorporate a value model into emotionally intelligent dialogue systems.
1 code implementation • 12 Dec 2021 • Yizhou Zhao, Liang Qiu, Pan Lu, Feng Shi, Tian Han, Song-Chun Zhu
Current pre-training methods in computer vision focus on natural images in the daily-life context.
1 code implementation • 25 Oct 2021 • Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu
Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.
Ranked #1 on
Visual Question Answering (VQA)
on IconQA
no code implementations • ACL 2021 • Liang Qiu, Yuan Liang, Yizhou Zhao, Pan Lu, Baolin Peng, Zhou Yu, Ying Nian Wu, Song-Chun Zhu
Inferring social relations from dialogues is vital for building emotionally intelligent robots to interpret human language better and act accordingly.
Ranked #5 on
Dialog Relation Extraction
on DialogRE
1 code implementation • ACL 2021 • Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, Song-Chun Zhu
We further propose a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (Inter-GPS).
Ranked #1 on
Mathematical Question Answering
on GeoS
no code implementations • 12 Mar 2021 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu
One of which is to track the agent's mental state transition and teach the agent to make decisions guided by its value like a human.
no code implementations • AAAI Conference on Artificial Intelligence (AAAI 2020) 2020 • Wei Zhang, Yue Ying, Pan Lu, Hongyuan Zha
Personalized image caption, a natural extension of the standard image caption task, requires to generate brief image descriptions tailored for users’ writing style and traits, and is more practical to meet users’ real demands.
no code implementations • International Joint Conferences on Artifical Intelligence (IJCAI) 2019 • Botian Shi, Lei Ji, Pan Lu, Zhendong Niu, Nan Duan
In this paper, we develop a Scene Concept Graph (SCG) by aggregating image scene graphs and extracting frequently co-occurred concept pairs as scene common-sense knowledge.
no code implementations • 2019 IEEE 35th International Conference on Data Engineering (ICDE) 2019 • Ning Liu, Pan Lu, Wei zhang, Jianyong Wang
To address the above issues, we propose novel Knowledge-aware Deep Dual Networks (K-DDN) for the text-based mortality prediction task.
no code implementations • Advances in Knowledge Discovery and Data Mining (PAKDD 2019) 2019 • Yuanquan Lu, Wei zhang, Pan Lu, Jianyong Wang
Nowadays, the online interactions between users and items become diverse, and may include textual reviews as well as numerical ratings.
no code implementations • 13 Dec 2018 • Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li
It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.
no code implementations • ECCV 2018 • Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang
Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.
1 code implementation • 24 May 2018 • Pan Lu, Lei Ji, Wei zhang, Nan Duan, Ming Zhou, Jianyong Wang
To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA.
1 code implementation • 18 Nov 2017 • Pan Lu, Hongsheng Li, Wei zhang, Jianyong Wang, Xiaogang Wang
Existing VQA methods mainly adopt the visual attention mechanism to associate the input question with corresponding image regions for effective question answering.