1 code implementation • 28 Feb 2024 • Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu
Additionally, we propose several pre-training tasks to model the interaction among text, structure, and image modalities effectively.
no code implementations • 5 Feb 2024 • Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, Situo Zhang, Zihan Zhao, Liangtai Sun, Kai Yu
Rapid progress in multimodal large language models (MLLMs) highlights the need to introduce challenging yet realistic benchmarks to the academic community, while existing benchmarks primarily focus on understanding simple natural images and short context.
no code implementations • 1 Feb 2024 • Qun Ma, Xiao Xue, Deyu Zhou, Xiangning Yu, Donghua Liu, Xuwen Zhang, Zihan Zhao, Yifan Shen, Peilin Ji, Juanjuan Li, Gang Wang, Wanpeng Ma
These agents, known as LLM-based Agent, offer the potential to enhance the anthropomorphism lacking in ABM.
no code implementations • 26 Jan 2024 • Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, Xin Chen, Kai Yu
To this end, we develop ChemDFM, the first LLM towards CGI.
1 code implementation • 25 Aug 2023 • Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, Kai Yu
This design suffers from data leakage problem and lacks the evaluation of subjective Q/A ability.
1 code implementation • 20 Aug 2023 • Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang
While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.
1 code implementation • NeurIPS 2023 • Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu
By equipping the LLM with a long-term experience memory, REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals, which excels an LLM-based agent with fixed exemplars or equipped with a transient working memory.
1 code implementation • 14 May 2023 • Danyang Zhang, Hongshen Xu, Zihan Zhao, Lu Chen, Ruisheng Cao, Kai Yu
A GUI task set based on WikiHow app is collected on Mobile-Env to form a benchmark covering a range of GUI interaction capabilities.
no code implementations • 20 Feb 2023 • Zihan Zhao, Yu Wang, Yanfeng Wang
Multimodal emotion recognition is a challenging research area that aims to fuse different modalities to predict human emotion.
no code implementations • 26 Sep 2022 • Chuang Liu, Lei Kou, Guowei Cai, Zihan Zhao, Zhe Zhang
Power electronics converters have been widely used in aerospace system, DC transmission, distributed energy, smart grid and so forth, and the reliability of power electronics converters has been a hotspot in academia and industry.
no code implementations • 11 Jul 2022 • Zihan Zhao, Yanfeng Wang, Yu Wang
The research and applications of multimodal emotion recognition have become increasingly popular recently.
1 code implementation • NAACL 2022 • Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen, Kai Yu
Recently, the structural reading comprehension (SRC) task on web pages has attracted increasing research interests.
1 code implementation • EMNLP 2021 • Xingyu Chen, Zihan Zhao, Lu Chen, Danyang Zhang, Jiabao Ji, Ao Luo, Yuxuan Xiong, Kai Yu
In this paper, we introduce the task of structural reading comprehension (SRC) on web.
no code implementations • 14 Oct 2020 • Zihan Zhao, Yuncong Liu, Lu Chen, Qi Liu, Rao Ma, Kai Yu
Recently, pre-trained language models like BERT have shown promising performance on multiple natural language processing tasks.
1 code implementation • 25 Sep 2020 • Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, Liqing Zhang
Thus, we focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations provided for unseen categories.
2 code implementations • 16 Aug 2020 • Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, Liqing Zhang
In this paper, we propose a novel context-aware feature generation method for zero-shot segmentation named CaGNet.