Search Results for author: Zhiyang Xu

Found 13 papers, 5 papers with code

Multimodal Instruction Tuning with Conditional Mixture of LoRA

no code implementations24 Feb 2024 Ying Shen, Zhiyang Xu, Qifan Wang, Yu Cheng, Wenpeng Yin, Lifu Huang

Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks.

Zero-shot Generalization

Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning

no code implementations18 Feb 2024 Zhiyang Xu, Chao Feng, Rulin Shao, Trevor Ashby, Ying Shen, Di Jin, Yu Cheng, Qifan Wang, Lifu Huang

Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data.

Hallucination Visual Question Answering

X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects

no code implementations15 Nov 2023 Minqian Liu, Ying Shen, Zhiyang Xu, Yixin Cao, Eunah Cho, Vaibhav Kumar, Reza Ghanadan, Lifu Huang

Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e. g., consistency and naturalness) to obtain a comprehensive assessment.

Dialogue Generation Language Modelling +2

MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks

1 code implementation8 Oct 2023 Jingyuan Qi, Minqian Liu, Ying Shen, Zhiyang Xu, Lifu Huang

Automatically generating scripts (i. e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks, especially unfamiliar ones.

The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models

1 code implementation24 May 2023 Jingyuan Qi, Zhiyang Xu, Ying Shen, Minqian Liu, Di Jin, Qifan Wang, Lifu Huang

Chain-of-Thought (CoT) prompting enables large language models to solve complex reasoning problems by generating intermediate steps.

Language Modelling Math +2

AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes

no code implementations24 May 2023 Barry Menglong Yao, Yu Chen, Qifan Wang, Sijia Wang, Minqian Liu, Zhiyang Xu, Licheng Yu, Lifu Huang

We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also described with a text description, a visual image and a set of attributes and values.

Attribute Entity Linking

Iteratively Improving Biomedical Entity Linking and Event Extraction via Hard Expectation-Maximization

no code implementations24 May 2023 Xiaochu Li, Minqian Liu, Zhiyang Xu, Lifu Huang

To solve these challenges, we propose joint biomedical entity linking and event extraction by regarding the event structures and entity references in knowledge bases as latent variables and updating the two task-specific models in a hard Expectation-Maximization (EM) fashion: (1) predicting the missing variables for each partially annotated dataset based on the current two task-specific models, and (2) updating the parameters of each model on the corresponding pseudo completed dataset.

Entity Linking Event Extraction +1

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

1 code implementation21 Dec 2022 Zhiyang Xu, Ying Shen, Lifu Huang

Our results indicate that fine-tuning the model on a diverse set of tasks and instructions leads to a reduced sensitivity to variations in instructions for each task.

Transfer Learning Zero-Shot Learning

Identifying and Measuring Token-Level Sentiment Bias in Pre-trained Language Models with Prompts

no code implementations15 Apr 2022 Apoorv Garg, Deval Srivastava, Zhiyang Xu, Lifu Huang

Due to the superior performance, large-scale pre-trained language models (PLMs) have been widely adopted in many aspects of human society.

Hyperspectral Image Super-Resolution in Arbitrary Input-Output Band Settings

no code implementations19 Mar 2021 Zhongyang Zhang, Zhiyang Xu, Zia Ahmed, Asif Salekin, Tauhidur Rahman

However, one of the fundamental limitations of these approaches is that they are highly dependent on image and camera settings and can only learn to map an input HSI with one specific setting to an output HSI with another.

Hyperspectral Image Super-Resolution Image Super-Resolution +1

Using BibTeX to Automatically Generate Labeled Data for Citation Field Extraction

1 code implementation AKBC 2020 Dung Thai, Zhiyang Xu, Nicholas Monath, Boris Veytsman, Andrew McCallum

In this paper, we describe a technique for using BibTeX to generate, automatically, a large-scale 41M labeled strings), labeled dataset, that is four orders of magnitude larger than the current largest CFE dataset, namely the UMass Citation Field Extraction dataset [Anzaroot and McCallum, 2013].

Management

Cannot find the paper you are looking for? You can Submit a new open access paper.