Search Results for author: Zihan Zhao

Found 16 papers, 9 papers with code

Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding

1 code implementation28 Feb 2024 Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu

Additionally, we propose several pre-training tasks to model the interaction among text, structure, and image modalities effectively.

document understanding Information Retrieval +1

MULTI: Multimodal Understanding Leaderboard with Text and Images

no code implementations5 Feb 2024 Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, Situo Zhang, Zihan Zhao, Liangtai Sun, Kai Yu

Rapid progress in multimodal large language models (MLLMs) highlights the need to introduce challenging yet realistic benchmarks to the academic community, while existing benchmarks primarily focus on understanding simple natural images and short context.

In-Context Learning

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

1 code implementation20 Aug 2023 Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.

Multiple-choice Question Answering

Large Language Models Are Semi-Parametric Reinforcement Learning Agents

1 code implementation NeurIPS 2023 Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu

By equipping the LLM with a long-term experience memory, REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals, which excels an LLM-based agent with fixed exemplars or equipped with a transient working memory.

Language Modelling Large Language Model +1

Mobile-Env: An Evaluation Platform and Benchmark for LLM-GUI Interaction

1 code implementation14 May 2023 Danyang Zhang, Hongshen Xu, Zihan Zhao, Lu Chen, Ruisheng Cao, Kai Yu

A GUI task set based on WikiHow app is collected on Mobile-Env to form a benchmark covering a range of GUI interaction capabilities.

Language Modelling

Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition

no code implementations20 Feb 2023 Zihan Zhao, Yu Wang, Yanfeng Wang

Multimodal emotion recognition is a challenging research area that aims to fuse different modalities to predict human emotion.

Multimodal Emotion Recognition

Review for AI-based Open-Circuit Faults Diagnosis Methods in Power Electronics Converters

no code implementations26 Sep 2022 Chuang Liu, Lei Kou, Guowei Cai, Zihan Zhao, Zhe Zhang

Power electronics converters have been widely used in aerospace system, DC transmission, distributed energy, smart grid and so forth, and the reliability of power electronics converters has been a hotspot in academia and industry.

Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition

no code implementations11 Jul 2022 Zihan Zhao, Yanfeng Wang, Yu Wang

The research and applications of multimodal emotion recognition have become increasingly popular recently.

Multimodal Emotion Recognition Transfer Learning

An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models

no code implementations14 Oct 2020 Zihan Zhao, Yuncong Liu, Lu Chen, Qi Liu, Rao Ma, Kai Yu

Recently, pre-trained language models like BERT have shown promising performance on multiple natural language processing tasks.

Clustering Quantization

From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

1 code implementation25 Sep 2020 Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, Liqing Zhang

Thus, we focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations provided for unseen categories.

Image Classification Segmentation +3

Context-aware Feature Generation for Zero-shot Semantic Segmentation

2 code implementations16 Aug 2020 Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, Liqing Zhang

In this paper, we propose a novel context-aware feature generation method for zero-shot segmentation named CaGNet.

Segmentation Semantic Segmentation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.