Search Results for author: Mengkang Hu

Found 14 papers, 5 papers with code

Text2World: Benchmarking Large Language Models for Symbolic World Model Generation

no code implementations18 Feb 2025 Mengkang Hu, Tianxing Chen, Yude Zou, YuHeng Lei, Qiguang Chen, Ming Li, Hongyuan Zhang, Wenqi Shao, Ping Luo

Recently, there has been growing interest in leveraging large language models (LLMs) to generate symbolic world models from textual descriptions.

Benchmarking

Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective

no code implementations23 Dec 2024 Xinmiao Yu, Xiaocheng Feng, Yun Li, Minghui Liao, Ya-Qi Yu, Xiachong Feng, Weihong Zhong, Ruihan Chen, Mengkang Hu, Jihao Wu, Dandan Tu, Duyu Tang, Bing Qin

To mitigate this issue, we propose MVCL-MI (Maximization of Vision-Language Cross-Lingual Mutual Information), where a visual-text cross-lingual alignment is built by maximizing mutual information between the model's outputs and visual information.

Question Answering Visual Question Answering

EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents

no code implementations30 Oct 2024 Junting Chen, Checheng Yu, Xunzhe Zhou, Tianqi Xu, Yao Mu, Mengkang Hu, Wenqi Shao, Yikai Wang, Guohao Li, Lin Shao

Heterogeneous multi-robot systems (HMRS) have emerged as a powerful approach for tackling complex tasks that single robots cannot manage alone.

Large Language Model Object Rearrangement +1

HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model

1 code implementation18 Aug 2024 Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, Ping Luo

Specifically, HiAgent prompts LLMs to formulate subgoals before generating executable actions and enables LLMs to decide proactively to replace previous subgoals with summarized observations, retaining only the action-observation pairs relevant to the current subgoal.

Language Modeling Language Modelling +2

AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

1 code implementation1 Aug 2024 Mengkang Hu, Pu Zhao, Can Xu, Qingfeng Sun, JianGuang Lou, QIngwei Lin, Ping Luo, Saravan Rajmohan

Moreover, to increase the difficulty diversity of generated planning tasks, we propose a bidirectional evolution method, Bi-Evol, that evolves planning tasks from easier and harder directions to synthesize a task set with a smoother difficulty curve.

Diversity Language Modeling +2

DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

no code implementations14 Jun 2024 Zeyu Gao, Yao Mu, Jinye Qu, Mengkang Hu, Shijia Peng, Chengkai Hou, Lingyue Guo, Ping Luo, Shanghang Zhang, YanFeng Lu

Extensive experiments demonstrate the superiority of DAG-Plan over directly using LLM to generate linear task sequence, achieving 52. 8% higher efficiency compared to the single-arm task planning and 48% higher success rate of the dual-arm task planning.

Task Planning

Needle In A Multimodal Haystack

1 code implementation11 Jun 2024 Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang

In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

Retrieval

AnalogCoder: Analog Circuit Design via Training-Free Code Generation

1 code implementation23 May 2024 Yao Lai, Sungyoung Lee, Guojin Chen, Souradip Poddar, Mengkang Hu, David Z. Pan, Ping Luo

Analog circuit design is a significant task in modern chip technology, focusing on the selection of component types, connectivity, and parameters to ensure proper circuit functionality.

Code Generation

KET-QA: A Dataset for Knowledge Enhanced Table Question Answering

no code implementations13 May 2024 Mengkang Hu, Haoyu Dong, Ping Luo, Shi Han, Dongmei Zhang

In this paper, we propose to use a knowledge base (KB) as the external knowledge source for TableQA and construct a dataset KET-QA with fine-grained gold evidence annotation.

Question Answering

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

no code implementations12 Oct 2023 Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo

This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations.

Decision Making Task Planning

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

no code implementations NeurIPS 2023 Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.

Image Captioning Language Modelling +3

TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data

1 code implementation25 May 2022 Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Shi Han, Dongmei Zhang

Existing auto-regressive pre-trained language models (PLMs) like T5 and BART, have been well applied to table question answering by UNIFIEDSKG and TAPEX, respectively, and demonstrated state-of-the-art results on multiple benchmarks.

Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.