Search Results for author: Haotian Liu

Found 33 papers, 14 papers with code

Generalizable Face Landmarking Guided by Conditional Face Warping

1 code implementation • 18 Apr 2024 • Jiayi Liang, Haotian Liu, Hongteng Xu, Dixin Luo

Given a pair of real and stylized facial images, the conditional face warper predicts a warping field from the real face to the stylized one, in which the face landmarker predicts the ending points of the warping field and provides us with high-quality pseudo landmarks for the corresponding stylized facial images.

Domain Adaptation

Paper
Code

Deep Cooperation in ISAC System: Resource, Node and Infrastructure Perspectives

no code implementations • 5 Mar 2024 • Zhiqing Wei, Haotian Liu, Zhiyong Feng, Huici Wu, Fan Liu, Qixun Zhang

With the mobile communication system evolving into 6th-generation (6G), the Internet of Everything (IoE) is becoming reality, which connects human, big data and intelligent machines to support the intelligent decision making, reconfiguring the traditional industries and human life.

Paper
Add Code

PCDepth: Pattern-based Complementary Learning for Monocular Depth Estimation by Best of Both Worlds

no code implementations • 29 Feb 2024 • Haotian Liu, Sanqing Qu, Fan Lu, Zongtao Bu, Florian Roehrbein, Alois Knoll, Guang Chen

Therefore, existing complementary learning approaches for MDE fuse intensity information from images and scene details from event data for better scene understanding.

Depth Prediction Monocular Depth Estimation +2

Paper
Add Code

Edit One for All: Interactive Batch Image Editing

no code implementations • 18 Jan 2024 • Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee

With increased human control, it is now possible to edit an image in a plethora of ways; from specifying in text what we want to change, to straight up dragging the contents of the image in an interactive point-based manner.

Paper
Add Code

Making Large Multimodal Models Understand Arbitrary Visual Prompts

no code implementations • 1 Dec 2023 • Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee

Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain.

Visual Commonsense Reasoning Visual Prompting

Paper
Add Code

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

1 code implementation • 9 Nov 2023 • Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li

LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models.

Ranked #1 on LMM real-life tasks on Leaderboard

Instruction Following LLM real-life tasks +3

621

Paper
Code

Improved Baselines with Visual Instruction Tuning

5 code implementations • 5 Oct 2023 • Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning.

Ranked #3 on visual instruction following on LLaVA-Bench

Factual Inconsistency Detection in Chart Captioning visual instruction following +1

124,984

Paper
Code

Integrated Sensing and Communication Signal Processing Based on Compressed Sensing Over Unlicensed Spectrum Bands

no code implementations • 4 Oct 2023 • Haotian Liu, Zhiqing Wei, Fengyun Li, Yuewei Lin, Hanyang Qu, Huici Wu, Zhiyong Feng

The ISAC-enabled mobile communication system regularly operate in non-continuous spectrum bands due to crowded licensed frequency bands.

Paper
Add Code

Carrier Aggregation Enabled Integrated Sensing and Communication Signal Design and Processing

no code implementations • 25 Sep 2023 • Zhiqing Wei, Haotian Liu, Xinyi Yang, Wangjun Jiang, Huici Wu, Xingwang Li, Zhiyong Feng

The future mobile communication systems will support intelligent applications such as Internet of Vehicles (IoV) and Extended Reality (XR).

Paper
Add Code

Aligning Large Multimodal Models with Factually Augmented RLHF

no code implementations • 25 Sep 2023 • Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.

Hallucination Image Captioning +1

Paper
Add Code

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

1 code implementation • 18 Sep 2023 • Yadong Lu, Chunyuan Li, Haotian Liu, Jianwei Yang, Jianfeng Gao, Yelong Shen

We find that scaling LMM consistently enhances model performance and improves language capabilities, and performance of LoRA/QLoRA tuning of LMM are comparable to the performance of full-model fine-tuning.

Ranked #47 on Visual Question Answering on MM-Vet

Visual Question Answering

16,101

Paper
Code

Benchmarking and Analyzing Generative Data for Visual Recognition

no code implementations • 25 Jul 2023 • Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu

Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition.

Benchmarking Retrieval

Paper
Add Code

Generate Anything Anywhere in Any Scene

no code implementations • 29 Jun 2023 • Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee

Text-to-image diffusion models have attracted considerable interest due to their wide applicability across diverse fields.

Data Augmentation Object

Paper
Add Code

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

no code implementations • NeurIPS 2023 • Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao

In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.

Instruction Following Language Modelling +2

Paper
Add Code

Visual Instruction Tuning

9 code implementations • NeurIPS 2023 • Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee

Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field.

Ranked #4 on Visual Question Answering on BenchLMM

Video Question Answering visual instruction following +2

124,984

Paper
Code

Data-Efficient Image Quality Assessment with Attention-Panel Decoder

1 code implementation • 11 Apr 2023 • Guanyi Qin, Runze Hu, Yutao Liu, Xiawu Zheng, Haotian Liu, Xiu Li, Yan Zhang

Blind Image Quality Assessment (BIQA) is a fundamental task in computer vision, which however remains unresolved due to the complex distortion conditions and diversified image contents.

Blind Image Quality Assessment

Paper
Code

TMA: Temporal Motion Aggregation for Event-based Optical Flow

1 code implementation • ICCV 2023 • Haotian Liu, Guang Chen, Sanqing Qu, Yanping Zhang, Zhijun Li, Alois Knoll, Changjun Jiang

In this paper, we argue that temporal continuity is a vital element of event-based optical flow and propose a novel Temporal Motion Aggregation (TMA) approach to unlock its potential.

Event-based Optical Flow Optical Flow Estimation

Paper
Code

GLIGEN: Open-Set Grounded Text-to-Image Generation

1 code implementation • CVPR 2023 • Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee

Large-scale text-to-image diffusion models have made amazing advances.

Ranked #4 on Conditional Text-to-Image Synthesis on COCO-MIG

Conditional Text-to-Image Synthesis Image Inpainting

1,790

Paper
Code

Learning Customized Visual Models with Retrieval-Augmented Knowledge

1 code implementation • CVPR 2023 • Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, Chunyuan Li

Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability.

Ranked #1 on Semi-Supervised Image Classification on ImageNet - 1% labeled data (using extra training data)

Contrastive Learning Retrieval +3

117

Paper
Code

Reducing Action Space: Reference-Model-Assisted Deep Reinforcement Learning for Inverter-based Volt-Var Control

no code implementations • 10 Oct 2022 • Qiong Liu, Ye Guo, Lirong Deng, Haotian Liu, Dongyu Li, Hongbin Sun

We investigate that a large action space increases the learning difficulties of DRL and degrades the optimization performance in the process of generating data and training neural networks.

Paper
Add Code

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

8 code implementations • 19 Apr 2022 • Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, Ping Jin, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao

In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks.

Ranked #1 on Object Detection on ELEVATER

Fairness Few-Shot Image Classification +4

1,957

Paper
Code

End-to-End Instance Edge Detection

no code implementations • 6 Apr 2022 • Xueyan Zou, Haotian Liu, Yong Jae Lee

We demonstrate highly competitive instance edge detection performance compared to state-of-the-art baselines, and also show that the proposed task and loss are complementary to instance segmentation and object detection.

Edge Detection Instance Segmentation +5

Paper
Add Code

Reducing Learning Difficulties: One-Step Two-Critic Deep Reinforcement Learning for Inverter-based Volt-Var Control

no code implementations • 30 Mar 2022 • Qiong Liu, Ye Guo, Lirong Deng, Haotian Liu, Dongyu Li, Hongbin Sun, Wenqi Huang

Then we design the one-step actor-critic DRL scheme which is a simplified version of recent DRL algorithms, and it avoids the issue of Q value overestimation successfully.

Paper
Add Code

Masked Discrimination for Self-Supervised Learning on Point Clouds

1 code implementation • 21 Mar 2022 • Haotian Liu, Mu Cai, Yong Jae Lee

Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.

Ranked #12 on Few-Shot 3D Point Cloud Classification on ModelNet40 5-way (10-shot) (using extra training data)

3D Shape Classification Binary Classification +4

Paper
Code

M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images

1 code implementation • 30 Oct 2021 • Qing Liu, Haotian Liu, Wei Ke, Yixiong Liang

It reassembles features in a dimension-reduced feature space and simultaneously aggregates multiple features inside a large predefined region into multiple target features.

Lesion Segmentation Segmentation

Paper
Code

Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices

no code implementations • 13 Apr 2021 • Haotian Liu, Wenchuan Wu

Such VCC is formulated as a two-timescale optimization problem to jointly optimize FTCDs and STDDs in ADNs.

Reinforcement Learning (RL)

Paper
Add Code

YolactEdge: Real-time Instance Segmentation on the Edge

2 code implementations • 22 Dec 2020 • Haotian Liu, Rafael A. Rivera Soto, Fanyi Xiao, Yong Jae Lee

We propose YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds.

Real-time Instance Segmentation Semantic Segmentation

1,258

Paper
Code

Dual-Branch Network with Dual-Sampling Modulated Dice Loss for Hard Exudate Segmentation from Colour Fundus Images

no code implementations • 3 Dec 2020 • Qing Liu, Haotian Liu, Yixiong Liang

In detail, for the first branch, we use a uniform sampler to sample pixels from predicted segmentation mask for Dice loss calculation, which leads to this branch naturally be biased in favour of large hard exudates as Dice loss generates larger cost on misidentification of large hard exudates than small hard exudates.

Paper
Add Code

Online Multi-agent Reinforcement Learning for Decentralized Inverter-based Volt-VAR Control

no code implementations • 23 Jun 2020 • Haotian Liu, Wenchuan Wu

In this paper, we propose an online multi-agent reinforcement learning and decentralized control framework (OLDC) for VVC.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Universal time delay in static spherically symmetric spacetimes for null and timelike signals

no code implementations • 5 Jun 2020 • Haotian Liu, Junji Jia

A perturbative method to compute the total travel time of both null and lightlike rays in arbitrary static spherically symmetric spacetimes in the weak field limit is proposed.

General Relativity and Quantum Cosmology

Paper
Add Code

Two-stage Deep Reinforcement Learning for Inverter-based Volt-VAR Control in Active Distribution Networks

no code implementations • 20 May 2020 • Haotian Liu, Wenchuan Wu

In the sequential online stage, we transfer the offline agent safely as the online agent to perform continuous learning and controlling online with significantly improved safety and efficiency.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Using Deep Learning and Machine Learning to Detect Epileptic Seizure with Electroencephalography (EEG) Data

no code implementations • 6 Oct 2019 • Haotian Liu, Lin Xi, Ying Zhao, Zhixiang Li

The prediction of epileptic seizure has always been extremely challenging in medical domain.

BIG-bench Machine Learning EEG

Paper
Add Code

A Constructive Algorithm for Decomposing a Tensor into a Finite Sum of Orthonormal Rank-1 Terms

1 code implementation • 7 Jul 2014 • Kim Batselier, Haotian Liu, Ngai Wong

We propose a constructive algorithm that decomposes an arbitrary real tensor into a finite sum of orthonormal rank-1 outer products.

Numerical Analysis Numerical Analysis

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.