Search Results for author: Xiyao Wang

Found 18 papers, 10 papers with code

LLaVA-Critic: Learning to Evaluate Multimodal Models

no code implementations3 Oct 2024 Tianyi Xiong, Xiyao Wang, Dong Guo, Qinghao Ye, Haoqi Fan, Quanquan Gu, Heng Huang, Chunyuan Li

We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as a generalist evaluator to assess performance across a wide range of multimodal tasks.

Instruction Following

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

1 code implementation19 Jun 2024 YuHang Zhou, Jing Zhu, Paiheng Xu, Xiaoyu Liu, Xiyao Wang, Danai Koutra, Wei Ai, Furong Huang

Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive.

Knowledge Distillation

World Models with Hints of Large Language Models for Goal Achieving

no code implementations11 Jun 2024 Zeyuan Liu, Ziyu Huan, Xiyao Wang, Jiafei Lyu, Jian Tao, Xiu Li, Furong Huang, Huazhe Xu

By assigning higher intrinsic rewards to samples that align with the hints outlined by the language model during model rollouts, DLLM guides the agent toward meaningful and efficient exploration.

Decision Making Efficient Exploration +2

Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

2 code implementations24 May 2024 Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, YuHang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Furong Huang, Cao Xiao

In this paper, we propose SIMA, a framework that enhances visual and language modality alignment through self-improvement, eliminating the needs for external models or data.

Hallucination Image Comprehension +2

Calibrated Self-Rewarding Vision Language Models

1 code implementation23 May 2024 Yiyang Zhou, Zhiyuan Fan, Dongjie Cheng, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, Huaxiu Yao

In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input.

Hallucination Language Modelling +1

Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media Communications

no code implementations22 Jan 2024 YuHang Zhou, Paiheng Xu, Xiyao Wang, Xuan Lu, Ge Gao, Wei Ai

Our objective is to validate the hypothesis that ChatGPT can serve as a viable alternative to human annotators in emoji research and that its ability to explain emoji meanings can enhance clarity and transparency in online communications.

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

1 code implementation19 Jan 2024 Xiyao Wang, YuHang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang

However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less investigated.

Language Modelling Large Language Model +1

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

no code implementations11 Oct 2023 Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang

In this paper, we propose $\texttt{COPlanner}$, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration.

continuous-control Continuous Control +2

TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

1 code implementation22 Jun 2023 Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé III, Furong Huang

Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle.

continuous-control Continuous Control +4

Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function

no code implementations2 Feb 2023 Ruijie Zheng, Xiyao Wang, Huazhe Xu, Furong Huang

To test this hypothesis, we devise two practical robust training mechanisms through computing the adversarial noise and regularizing the value network's spectral norm to directly regularize the Lipschitz condition of the value functions.

Model-based Reinforcement Learning

Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

1 code implementation25 Jul 2022 Xiyao Wang, Wichayaporn Wongkamjan, Furong Huang

Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning.

continuous-control Continuous Control +2

Transfer RL across Observation Feature Spaces via Model-Based Regularization

no code implementations ICLR 2022 Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew Cohen, Furong Huang

In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations, and may thus be subject to dramatic changes over time (e. g. increased number of observable features).

Reinforcement Learning (RL)

Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement Learning

no code implementations24 Oct 2020 Xiyao Wang, Junge Zhang, Wenzhen Huang, Qiyue Yin

We give an upper bound of the trajectory reward estimation error and point out that increasing the agent's exploration ability is the key to reduce trajectory reward estimation error, thereby alleviating dynamics bottleneck dilemma.

continuous-control Continuous Control +5

Cannot find the paper you are looking for? You can Submit a new open access paper.