Search Results for author: Haotian Liu

Found 40 papers, 16 papers with code

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

1 code implementation26 Jun 2024 ZiRui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

All models lag far behind human performance of 80. 5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs.

Chart Understanding

Fantastic Copyrighted Beasts and How (Not) to Generate Them

no code implementations20 Jun 2024 Luxi He, Yangsibo Huang, Weijia Shi, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, Peter Henderson

Our evaluation systematically shows that both image and video generation models can still generate characters even if characters' names are not explicitly mentioned in the prompt, sometimes with only two generic keywords (e. g., prompting with "videogame, plumber" consistently generates Nintendo's Mario character).

Image Generation Video Generation

Yo'LLaVA: Your Personalized Language and Vision Assistant

no code implementations13 Jun 2024 Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, Yong Jae Lee

In this paper, we introduce the novel task of personalizing LMMs, so that they can have conversations about a specific subject.

Image Captioning Question Answering +1

Carrier Aggregation Enabled MIMO-OFDM Integrated Sensing and Communication

no code implementations17 May 2024 Haotian Liu, Zhiqing Wei, Jinghui Piao, Huici Wu, Xingwang Li, Zhiyong Feng

The challenges in sensing signal processing introduced by CA include the initial phase misalignment of the echo signals on high and low-frequency bands due to attenuation and radar cross section, and the fusion of the sensing data on high and lowfrequency bands with different physical-layer parameters.

Integrated Sensing and Communication Enabled Cooperative Passive Sensing Using Mobile Communication System

no code implementations15 May 2024 Zhiqing Wei, Haotian Liu, Hujun Li, Wangjun Jiang, Zhiyong Feng, Huici Wu, Ping Zhang

However, multi-BS cooperative passive sensing faces the challenges of synchronization offsets mitigation and sensing information fusion.

Target Localization with Macro and Micro Base Stations Cooperative Sensing

no code implementations5 May 2024 Haotian Liu, Zhiqing Wei, Furong Yang, Huici Wu, Kaifeng Han, Zhiyong Feng

Addressing the communication and sensing demands of sixth-generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered traction in academia and industry.

Generalizable Face Landmarking Guided by Conditional Face Warping

1 code implementation CVPR 2024 Jiayi Liang, Haotian Liu, Hongteng Xu, Dixin Luo

Given a pair of real and stylized facial images, the conditional face warper predicts a warping field from the real face to the stylized one, in which the face landmarker predicts the ending points of the warping field and provides us with high-quality pseudo landmarks for the corresponding stylized facial images.

Domain Adaptation

Deep Cooperation in ISAC System: Resource, Node and Infrastructure Perspectives

no code implementations5 Mar 2024 Zhiqing Wei, Haotian Liu, Zhiyong Feng, Huici Wu, Fan Liu, Qixun Zhang, Yucong Du

This article may provide a deep and comprehensive view on the cooperative sensing in ISAC system to enhance the performance of sensing, supporting the applications of IoE.

PCDepth: Pattern-based Complementary Learning for Monocular Depth Estimation by Best of Both Worlds

no code implementations29 Feb 2024 Haotian Liu, Sanqing Qu, Fan Lu, Zongtao Bu, Florian Roehrbein, Alois Knoll, Guang Chen

Therefore, existing complementary learning approaches for MDE fuse intensity information from images and scene details from event data for better scene understanding.

Depth Prediction Monocular Depth Estimation +2

Edit One for All: Interactive Batch Image Editing

no code implementations CVPR 2024 Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee

With increased human control, it is now possible to edit an image in a plethora of ways; from specifying in text what we want to change, to straight up dragging the contents of the image in an interactive point-based manner.

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

no code implementations CVPR 2024 Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee

Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain.

Visual Commonsense Reasoning Visual Prompting

Integrated Sensing and Communication Signal Processing Based on Compressed Sensing Over Unlicensed Spectrum Bands

no code implementations4 Oct 2023 Haotian Liu, Zhiqing Wei, Fengyun Li, Yuewei Lin, Hanyang Qu, Huici Wu, Zhiyong Feng

The ISAC-enabled mobile communication system regularly operate in non-continuous spectrum bands due to crowded licensed frequency bands.

Aligning Large Multimodal Models with Factually Augmented RLHF

no code implementations25 Sep 2023 Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.

Hallucination Image Captioning +1

Carrier Aggregation Enabled Integrated Sensing and Communication Signal Design and Processing

no code implementations25 Sep 2023 Zhiqing Wei, Haotian Liu, Xinyi Yang, Wangjun Jiang, Huici Wu, Xingwang Li, Zhiyong Feng

The future mobile communication systems will support intelligent applications such as Internet of Vehicles (IoV) and Extended Reality (XR).

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

1 code implementation18 Sep 2023 Yadong Lu, Chunyuan Li, Haotian Liu, Jianwei Yang, Jianfeng Gao, Yelong Shen

We find that scaling LMM consistently enhances model performance and improves language capabilities, and performance of LoRA/QLoRA tuning of LMM are comparable to the performance of full-model fine-tuning.

Visual Question Answering

Benchmarking and Analyzing Generative Data for Visual Recognition

no code implementations25 Jul 2023 Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu

Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition.

Benchmarking Retrieval

Generate Anything Anywhere in Any Scene

no code implementations29 Jun 2023 Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee

Text-to-image diffusion models have attracted considerable interest due to their wide applicability across diverse fields.

Data Augmentation Object

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

1 code implementation NeurIPS 2023 Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao

In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.

Instruction Following Language Modelling +2

Visual Instruction Tuning

10 code implementations NeurIPS 2023 Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee

Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field.

1 Image, 2*2 Stitching Image Retrieval +4

Data-Efficient Image Quality Assessment with Attention-Panel Decoder

1 code implementation11 Apr 2023 Guanyi Qin, Runze Hu, Yutao Liu, Xiawu Zheng, Haotian Liu, Xiu Li, Yan Zhang

Blind Image Quality Assessment (BIQA) is a fundamental task in computer vision, which however remains unresolved due to the complex distortion conditions and diversified image contents.

Blind Image Quality Assessment Decoder

TMA: Temporal Motion Aggregation for Event-based Optical Flow

1 code implementation ICCV 2023 Haotian Liu, Guang Chen, Sanqing Qu, Yanping Zhang, Zhijun Li, Alois Knoll, Changjun Jiang

In this paper, we argue that temporal continuity is a vital element of event-based optical flow and propose a novel Temporal Motion Aggregation (TMA) approach to unlock its potential.

Event-based Optical Flow Optical Flow Estimation

Reducing Action Space: Reference-Model-Assisted Deep Reinforcement Learning for Inverter-based Volt-Var Control

no code implementations10 Oct 2022 Qiong Liu, Ye Guo, Lirong Deng, Haotian Liu, Dongyu Li, Hongbin Sun

We investigate that a large action space increases the learning difficulties of DRL and degrades the optimization performance in the process of generating data and training neural networks.

End-to-End Instance Edge Detection

no code implementations6 Apr 2022 Xueyan Zou, Haotian Liu, Yong Jae Lee

We demonstrate highly competitive instance edge detection performance compared to state-of-the-art baselines, and also show that the proposed task and loss are complementary to instance segmentation and object detection.

Decoder Edge Detection +6

Reducing Learning Difficulties: One-Step Two-Critic Deep Reinforcement Learning for Inverter-based Volt-Var Control

no code implementations30 Mar 2022 Qiong Liu, Ye Guo, Lirong Deng, Haotian Liu, Dongyu Li, Hongbin Sun, Wenqi Huang

Then we design the one-step actor-critic DRL scheme which is a simplified version of recent DRL algorithms, and it avoids the issue of Q value overestimation successfully.

Masked Discrimination for Self-Supervised Learning on Point Clouds

1 code implementation21 Mar 2022 Haotian Liu, Mu Cai, Yong Jae Lee

Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.

3D Shape Classification Binary Classification +4

M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images

1 code implementation30 Oct 2021 Qing Liu, Haotian Liu, Wei Ke, Yixiong Liang

It reassembles features in a dimension-reduced feature space and simultaneously aggregates multiple features inside a large predefined region into multiple target features.

Lesion Segmentation Segmentation

Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices

no code implementations13 Apr 2021 Haotian Liu, Wenchuan Wu

Such VCC is formulated as a two-timescale optimization problem to jointly optimize FTCDs and STDDs in ADNs.

Reinforcement Learning (RL)

YolactEdge: Real-time Instance Segmentation on the Edge

2 code implementations22 Dec 2020 Haotian Liu, Rafael A. Rivera Soto, Fanyi Xiao, Yong Jae Lee

We propose YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds.

Real-time Instance Segmentation Semantic Segmentation

Dual-Branch Network with Dual-Sampling Modulated Dice Loss for Hard Exudate Segmentation from Colour Fundus Images

no code implementations3 Dec 2020 Qing Liu, Haotian Liu, Yixiong Liang

In detail, for the first branch, we use a uniform sampler to sample pixels from predicted segmentation mask for Dice loss calculation, which leads to this branch naturally be biased in favour of large hard exudates as Dice loss generates larger cost on misidentification of large hard exudates than small hard exudates.

Online Multi-agent Reinforcement Learning for Decentralized Inverter-based Volt-VAR Control

no code implementations23 Jun 2020 Haotian Liu, Wenchuan Wu

In this paper, we propose an online multi-agent reinforcement learning and decentralized control framework (OLDC) for VVC.

Multi-agent Reinforcement Learning reinforcement-learning +1

Universal time delay in static spherically symmetric spacetimes for null and timelike signals

no code implementations5 Jun 2020 Haotian Liu, Junji Jia

A perturbative method to compute the total travel time of both null and lightlike rays in arbitrary static spherically symmetric spacetimes in the weak field limit is proposed.

General Relativity and Quantum Cosmology

Two-stage Deep Reinforcement Learning for Inverter-based Volt-VAR Control in Active Distribution Networks

no code implementations20 May 2020 Haotian Liu, Wenchuan Wu

In the sequential online stage, we transfer the offline agent safely as the online agent to perform continuous learning and controlling online with significantly improved safety and efficiency.

reinforcement-learning Reinforcement Learning (RL)

A Constructive Algorithm for Decomposing a Tensor into a Finite Sum of Orthonormal Rank-1 Terms

1 code implementation7 Jul 2014 Kim Batselier, Haotian Liu, Ngai Wong

We propose a constructive algorithm that decomposes an arbitrary real tensor into a finite sum of orthonormal rank-1 outer products.

Numerical Analysis Numerical Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.