Search Results for author: Yu Liu

Found 394 papers, 151 papers with code

SOM-NCSCM : An Efficient Neural Chinese Sentence Compression Model Enhanced with Self-Organizing Map

no code implementations EMNLP 2021 Kangli Zi, Shi Wang, Yu Liu, Jicun Li, Yanan Cao, Cungen Cao

Sentence Compression (SC), which aims to shorten sentences while retaining important words that express the essential meanings, has been studied for many years in many languages, especially in English.

Question Answering Sentence +2

More Classifiers, Less Forgetting: A Generic Multi-classifier Paradigm for Incremental Learning

1 code implementation ECCV 2020 Yu Liu, Sarah Parisot, Gregory Slabaugh, Xu Jia, Ales Leonardis, Tinne Tuytelaars

Since those regularization strategies are mostly associated with classifier outputs, we propose a MUlti-Classifier (MUC) incremental learning paradigm that integrates an ensemble of auxiliary classifiers to estimate more effective regularization constraints.

Incremental Learning

A Gravity-informed Spatiotemporal Transformer for Human Activity Intensity Prediction

no code implementations16 Jun 2025 Yi Wang, Zhenghong Wang, Fan Zhang, Chengling Tang, Chaogui Kang, Di Zhu, Zhongfu Ma, Sijie Ruan, Weiyu Zhang, Yu Zheng, Philip S. Yu, Yu Liu

Specifically, it (1) estimates two spatially explicit mass parameters based on inflow and outflow, (2) models the likelihood of cross-unit interaction using closed-form solutions of spatial interactions to constrain spatial modeling randomness, and (3) utilizes the learned spatial interaction to guide and mitigate the over-smoothing phenomenon in transformer attention matrices.

Domain Switching on the Pareto Front: Multi-Objective Deep Kernel Learning in Automated Piezoresponse Force Microscopy

no code implementations9 Jun 2025 Yu Liu, Utkarsh Pratiush, Kamyar Barakati, Hiroshi Funakubo, Ching-Che Lin, Jaegyu Kim, Lane W. Martin, Sergei V. Kalinin

Ferroelectric polarization switching underpins the functional performance of a wide range of materials and devices, yet its dependence on complex local microstructural features renders systematic exploration by manual or grid-based spectroscopic measurements impractical.

Active Learning Combinatorial Optimization

AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers

no code implementations3 Jun 2025 Linya Fu, Yu Liu, Zhijie Liu, Zedong Yang, Zhong-Qiu Wang, Youfu Li, He Kong

We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources.

Sound Source Localization

OpenCarbon: A Contrastive Learning-based Cross-Modality Neural Approach for High-Resolution Carbon Emission Prediction Using Open Data

1 code implementation3 Jun 2025 Jinwei Zeng, Yu Liu, Guozhen Zhang, Jingtao Ding, Yuming Lin, Jian Yuan, Yong Li

Our model, OpenCarbon, features two major designs that target the challenges: a cross-modality information extraction and fusion module to extract complementary functionality information from two modules and model their interactions, and a neighborhood-informed aggregation module to capture the spatial contiguity correlations.

Contrastive Learning

Exploring Domain Wall Pinning in Ferroelectrics via Automated High Throughput AFM

no code implementations29 May 2025 Kamyar Barakati, Yu Liu, Hiroshi Funakubo, Sergei V. Kalinin

Domain-wall dynamics in ferroelectric materials are strongly position-dependent since each polar interface is locked into a unique local microstructure.

A Tool for Generating Exceptional Behavior Tests With Large Language Models

1 code implementation28 May 2025 Linghan Zhong, Samuel Yuan, Jiyang Zhang, Yu Liu, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

Exceptional behavior tests (EBTs) are crucial in software development for verifying that code correctly handles unwanted events and throws appropriate exceptions.

Language Modeling Language Modelling +1

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

no code implementations25 May 2025 Shenggan Cheng, Yuanxin Wei, Lansong Diao, Yong liu, Bujiao Chen, Lianghua Huang, Yu Liu, Wenyuan Yu, Jiangsu Du, Wei Lin, Yang You

Leveraging the diffusion transformer (DiT) architecture, models like Sora, CogVideoX and Wan have achieved remarkable progress in text-to-video, image-to-video, and video editing tasks.

Video Editing Video Generation

Emotional Supporters often Use Multiple Strategies in a Single Turn

no code implementations21 May 2025 Xin Bai, Guanyi Chen, Tingting He, Chenlian Zhou, Yu Liu

We formally redefine the ESC task to account for this, proposing a revised formulation that requires generating the full sequence of strategy-utterance pairs given a dialogue history.

SurvUnc: A Meta-Model Based Uncertainty Quantification Framework for Survival Analysis

1 code implementation20 May 2025 Yu Liu, Weiyao Tao, Tong Xia, Simon Knight, Tingting Zhu

To bridge this gap, in this work, we introduce SurvUnc, a novel meta-model based framework for post-hoc uncertainty quantification for survival models.

Benchmarking Model Optimization +2

Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts

no code implementations20 May 2025 Xi Chen, Shen Yan, Juelin Zhu, Chen Chen, Yu Liu, Maojun Zhang

Existing methods predominantly rely on domain adaptation and generalization strategies, often utilizing small-scale models that exhibit limited performance.

Domain Generalization Land Cover Classification +1

M3Depth: Wavelet-Enhanced Depth Estimation on Mars via Mutual Boosting of Dual-Modal Data

no code implementations20 May 2025 Junjie Li, Jiawei Wang, Miyu Li, Yu Liu, Yumei Wang, Haitao Xu

Depth estimation plays a great potential role in obstacle avoidance and navigation for further Mars exploration missions.

Stereo Depth Estimation Stereo Matching

MPMA: Preference Manipulation Attack Against Model Context Protocol

no code implementations16 May 2025 Zihan Wang, Hongwei Li, Rui Zhang, Yu Liu, Wenbo Jiang, Wenshu Fan, Qingchuan Zhao, Guowen Xu

To achieve MPMA, we first design a Direct Preference Manipulation Attack ($\mathtt{DPMA}$) that achieves significant effectiveness by inserting the manipulative word and phrases into the tool name and description.

Fairness model

Seed1.5-VL Technical Report

no code implementations11 May 2025 Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, PengFei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng, Weiwei Liu, Wenqian Wang, Xianhan Zeng, Xiao Liu, Xiaobo Qin, Xiaohan Ding, Xiaojun Xiao, Xiaoying Zhang, Xuanwei Zhang, Xuehan Xiong, Yanghua Peng, Yangrui Chen, Yanwei Li, Yanxu Hu, Yi Lin, Yiyuan Hu, Yiyuan Zhang, Youbin Wu, Yu Li, Yudong Liu, Yue Ling, Yujia Qin, Zanbo Wang, Zhiwu He, Aoxue Zhang, Bairen Yi, Bencheng Liao, Can Huang, Can Zhang, Chaorui Deng, Chaoyi Deng, Cheng Lin, Cheng Yuan, Chenggang Li, Chenhui Gou, Chenwei Lou, Chengzhi Wei, Chundian Liu, Chunyuan Li, Deyao Zhu, Donghong Zhong, Feng Li, Feng Zhang, Gang Wu, Guodong Li, Guohong Xiao, Haibin Lin, Haihua Yang, Haoming Wang, Heng Ji, Hongxiang Hao, Hui Shen, Huixia Li, Jiahao Li, Jialong Wu, Jianhua Zhu, Jianpeng Jiao, Jiashi Feng, Jiaze Chen, Jianhui Duan, Jihao Liu, Jin Zeng, Jingqun Tang, Jingyu Sun, Joya Chen, Jun Long, Junda Feng, Junfeng Zhan, Junjie Fang, Junting Lu, Kai Hua, Kai Liu, Kai Shen, Kaiyuan Zhang, Ke Shen, Ke Wang, Keyu Pan, Kun Zhang, Kunchang Li, Lanxin Li, Lei LI, Lei Shi, Li Han, Liang Xiang, Liangqiang Chen, Lin Chen, Lin Li, Lin Yan, Liying Chi, Longxiang Liu, Mengfei Du, Mingxuan Wang, Ningxin Pan, Peibin Chen, Pengfei Chen, Pengfei Wu, Qingqing Yuan, Qingyao Shuai, Qiuyan Tao, Renjie Zheng, Renrui Zhang, Ru Zhang, Rui Wang, Rui Yang, Rui Zhao, Shaoqiang Xu, Shihao Liang, Shipeng Yan, Shu Zhong, Shuaishuai Cao, Shuangzhi Wu, Shufan Liu, Shuhan Chang, Songhua Cai, Tenglong Ao, Tianhao Yang, Tingting Zhang, Wanjun Zhong, Wei Jia, Wei Weng, Weihao Yu, Wenhao Huang, Wenjia Zhu, Wenli Yang, Wenzhi Wang, Xiang Long, XiangRui Yin, Xiao Li, Xiaolei Zhu, Xiaoying Jia, Xijin Zhang, Xin Liu, Xinchen Zhang, Xinyu Yang, Xiongcai Luo, Xiuli Chen, Xuantong Zhong, Xuefeng Xiao, Xujing Li, Yan Wu, Yawei Wen, Yifan Du, Yihao Zhang, Yining Ye, Yonghui Wu, Yu Liu, Yu Yue, Yufeng Zhou, Yufeng Yuan, Yuhang Xu, Yuhong Yang, Yun Zhang, Yunhao Fang, Yuntao Li, Yurui Ren, Yuwen Xiong, Zehua Hong, Zehua Wang, Zewei Sun, Zeyu Wang, Zhao Cai, Zhaoyue Zha, Zhecheng An, Zhehui Zhao, Zhengzhuo Xu, Zhipeng Chen, Zhiyong Wu, Zhuofan Zheng, ZiHao Wang, Zilong Huang, Ziyu Zhu, Zuquan Song

We present Seed1. 5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning.

Mixture-of-Experts Multimodal Reasoning +2

ICNN-enhanced 2SP: Leveraging input convex neural networks for solving two-stage stochastic programming

1 code implementation8 May 2025 Yu Liu, Fabricio Oliveira

Existing learning-based methods like Neural Two-Stage Stochastic Programming (Neur2SP) employ neural networks (NNs) as recourse function surrogates but rely on computationally intensive mixed-integer programming (MIP) formulations.

Decision Making Under Uncertainty

From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection

no code implementations6 May 2025 Guoting Wei, Yu Liu, Xia Yuan, Xizhe Xue, Linlin Guo, Yifan Yang, Chunxia Zhao, Zongwen Bai, Haokui Zhang, Rong Xiao

Using this label engine, we expand existing aerial detection datasets with rich textual annotations and construct a novel benchmark dataset, called Multi-instance Open-set Aerial Dataset (MI-OAD), addressing the limitations of current remote sensing grounding data and enabling effective open-set aerial detection.

Sentence

PhysioSync: Temporal and Cross-Modal Contrastive Learning Inspired by Physiological Synchronization for EEG-Based Emotion Recognition

1 code implementation24 Apr 2025 Kai Cui, Jia Li, Yu Liu, Xuesong Zhang, Zhenzhen Hu, Meng Wang

Besides, it introduces Long- and Short-Term Temporal Contrastive Learning (LS-TCL) to capture emotional synchronization at different temporal resolutions within modalities.

Contrastive Learning EEG +1

ADT: Tuning Diffusion Models with Adversarial Supervision

no code implementations15 Apr 2025 Dazhong Shen, Guanglu Song, Yi Zhang, Bingqi Ma, Lujundong Li, Dongzhi Jiang, Zhuofan Zong, Yu Liu

To address this problem, we propose an intuitive but effective fine-tuning framework, called Adversarial Diffusion Tuning (ADT), by stimulating the inference process during optimization and aligning the final outputs with training data by adversarial supervision.

Denoising Image Generation

The Power of the Pareto Front: Balancing Uncertain Rewards for Adaptive Experimentation in scanning probe microscopy

no code implementations9 Apr 2025 Yu Liu, Sergei V. Kalinin

Automated experimentation has the potential to revolutionize scientific discovery, but its effectiveness depends on well-defined optimization targets, which are often uncertain or probabilistic in real-world settings.

Bayesian Optimization Decision Making +1

High-Fidelity Diffusion Face Swapping with ID-Constrained Facial Conditioning

no code implementations28 Mar 2025 Dailan He, Xiahong Wang, Shulun Wang, Guanglu Song, Bingqi Ma, Hao Shao, Yu Liu, Hongsheng Li

Face swapping aims to seamlessly transfer a source facial identity onto a target while preserving target attributes such as pose and expression.

Attribute Face Swapping

Targetless 6DoF Calibration of LiDAR and 2D Scanning Radar Based on Cylindrical Occupancy

no code implementations21 Mar 2025 Weimin WANG, Yu Du, Ting Yang, Yu Liu

Consequently, a cost function involving extrinsic calibration parameters is formulated based on the spatial overlap of 3D grids and LiDAR points.

Autonomous Vehicles

ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing

no code implementations18 Mar 2025 Yulin Pan, Xiangteng He, Chaojie Mao, Zhen Han, Zeyinzi Jiang, Jingfeng Zhang, Yu Liu

In this paper, we propose ICE-Bench, a unified and comprehensive benchmark designed to rigorously assess image generation models.

Image Generation

Federated Mixture-of-Expert for Non-Overlapped Cross-Domain Sequential Recommendation

no code implementations17 Mar 2025 Yu Liu, Hanbin Jiang, Lei Zhu, Yu Zhang, Yuqi Mao, Jiangxia Cao, Shuchao Pang

In the real world, users always have multiple interests while surfing different services to enrich their daily lives, e. g., watching hot short videos/live streamings.

Federated Learning Privacy Preserving +1

VACE: All-in-One Video Creation and Editing

2 code implementations10 Mar 2025 Zeyinzi Jiang, Zhen Han, Chaojie Mao, Jingfeng Zhang, Yulin Pan, Yu Liu

Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation.

All Video Editing +1

OT-DETECTOR: Delving into Optimal Transport for Zero-shot Out-of-Distribution Detection

no code implementations9 Mar 2025 Yu Liu, Hao Tang, Haiqi Zhang, Jing Qin, Zechao Li

Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Rethinking Video Tokenization: A Conditioned Diffusion-based Approach

1 code implementation5 Mar 2025 Nianzu Yang, Pandeng Li, Liming Zhao, Yang Li, Chen-Wei Xie, Yehui Tang, Xudong Lu, Zhihang Liu, Yun Zheng, Yu Liu, Junchi Yan

Trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch, extensive experiments demonstrate that CDT achieves state-of-the-art performance in video reconstruction tasks with just a single-step sampling.

Decoder Video Compression +2

Molecule Generation for Target Protein Binding with Hierarchical Consistency Diffusion Model

1 code implementation2 Mar 2025 Guanlue Li, Chenran Jiang, Ziqi Gao, Yu Liu, Chenyang Liu, Jiean Chen, Yong Huang, Jia Li

Effective generation of molecular structures, or new chemical entities, that bind to target proteins is crucial for lead identification and optimization in drug discovery.

Drug Design Drug Discovery +1

ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

1 code implementation26 Feb 2025 Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, Siyuan Huang

Existing methods often fail to effectively integrate information across different object states, limiting the accuracy of part-mesh reconstruction and part dynamics modeling, particularly for complex multi-part articulated objects.

parameter estimation

Rewards-based image analysis in microscopy

no code implementations23 Feb 2025 Kamyar Barakati, Yu Liu, Utkarsh Pratiush, Boris N. Slautin, Sergei V. Kalinin

They can function as wrappers over classical and DCNN-based methods, making them applicable to both unsupervised and supervised workflows (e. g., classification, regression for structure-property mapping) across imaging and hyperspectral data.

Decision Making Denoising +5

Transfer learning in Scalable Graph Neural Network for Improved Physical Simulation

no code implementations7 Feb 2025 Siqi Shen, Yu Liu, Daniel Biggs, Omar Hafez, Jiandong Yu, Wentao Zhang, Bin Cui, Jiulong Shan

To enable the transfer learning between differently configured SGUNETs, we propose a set of mapping functions to align the parameters between the pre-trained model and the target model.

Graph Neural Network Physical Simulations +1

MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction

no code implementations5 Feb 2025 Xiaoshuai Hao, Yunfeng Diao, Mengchuan Wei, Yifan Yang, Peng Hao, Rong Yin, HUI ZHANG, Weiming Li, Shu Zhao, Yu Liu

To address these issues, we propose MapFusion, a novel multi-modal Bird's-Eye View (BEV) feature fusion method for map construction.

Autonomous Driving

RegionGCN: Spatial-Heterogeneity-Aware Graph Convolutional Networks

no code implementations29 Jan 2025 Hao Guo, Han Wang, Di Zhu, Lun Wu, A. Stewart Fotheringham, Yu Liu

However, current geographically weighting approaches are ineffective on graph neural networks, yielding no significant improvement in prediction accuracy.

Ensemble Learning

DiffDoctor: Diagnosing Image Diffusion Models Before Treating

no code implementations21 Jan 2025 Yiyang Wang, Xi Chen, Xiaogang Xu, Sihui Ji, Yu Liu, Yujun Shen, Hengshuang Zhao

In spite of the recent progress, image diffusion models still produce artifacts.

VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization

no code implementations16 Jan 2025 Zixun Fang, Zhiheng Liu, Kai Zhu, Yu Liu, Ka Leong Cheng, Wei Zhai, Yang Cao, Zheng-Jun Zha

Video colorization aims to transform grayscale videos into vivid color representations while maintaining temporal consistency and structural integrity.

Colorization Optical Flow Estimation

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing

no code implementations12 Jan 2025 Ruizhe Ou, Yuan Hu, Fan Zhang, Jiaxin Chen, Yu Liu

In addition, to address the absence of large-scale datasets for training pixel-level RS MLLMs, we construct the GeoPixInstruct dataset, comprising 65, 463 images and 140, 412 instances, with each instance annotated with text descriptions, bounding boxes, and masks.

Image Captioning Language Modeling +7

Data driven discovery of human mobility models

1 code implementation10 Jan 2025 Hao Guo, Weiyu Zhang, Junjie Yang, Yuanqiao Hou, Lei Dong, Yu Liu

However, for decades new mathematical formulas to model mobility phenomena have been scarce and usually discovered by analogy to physical processes, such as the gravity model and the radiation model.

Symbolic Regression

ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

no code implementations5 Jan 2025 Chaojie Mao, Jingfeng Zhang, Yulin Pan, Zeyinzi Jiang, Zhen Han, Yu Liu, Jingren Zhou

There are many models in the community based on the post-training of text-to-image foundational models that meet this training paradigm of the first stage.

Image Generation

MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception

no code implementations2 Jan 2025 Xiaoshuai Hao, Guanqun Liu, YuTing Zhao, Yuheng Ji, Mengchuan Wei, Haimei Zhao, Lingdong Kong, Rong Yin, Yu Liu

Multi-sensor fusion models play a crucial role in autonomous driving perception, particularly in tasks like 3D object detection and HD map construction.

3D Object Detection Autonomous Driving +3

Decompositional Neural Scene Reconstruction with Generative Diffusion Prior

no code implementations CVPR 2025 Junfeng Ni, Yu Liu, Ruijie Lu, Zirui Zhou, Song-Chun Zhu, Yixin Chen, Siyuan Huang

To this end, we propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.

Object Reconstruction

INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models

no code implementations28 Dec 2024 Di Jin, Xing Liu, Yu Liu, Jia Qing Yap, Andrea Wong, Adriana Crespo, Qi Lin, Zhiyuan Yin, Qiang Yan, Ryan Ye

The rapid development of large language models (LLMs) and large vision models (LVMs) have propelled the evolution of multi-modal AI systems, which have demonstrated the remarkable potential for industrial applications by emulating human-like cognition.

Fairness Image Generation

Automated Materials Discovery Platform Realized: Scanning Probe Microscopy of Combinatorial Libraries

no code implementations24 Dec 2024 Yu Liu, Rohit Pant, Ichiro Takeuchi, R. Jackson Spurling, Jon-Paul Maria, Maxim Ziatdinov, Sergei V. Kalinin

Here we demonstrate the implementation of fully automated SPM to explore the evolution of ferroelectric properties in combinatorial libraries, focusing on Sm-doped BiFeO3 and ZnxMg1-xO systems.

Bayesian Optimization

Color Enhancement for V-PCC Compressed Point Cloud via 2D Attribute Map Optimization

no code implementations19 Dec 2024 Jingwei Bao, Yu Liu, Zeliang Li, Shuyuan Zhu, Siu-Kei Au Yeung

This paper introduces a framework designed to enhance the color quality in the V-PCC compressed point clouds.

Attribute Transfer Learning

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

1 code implementation17 Dec 2024 Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Chen Liang, Tong Shen, Han Zhang, Huanzhang Dou, Yu Liu, Jingren Zhou

Building upon this foundation, we present ChatDiT, a zero-shot, general-purpose, and interactive visual generation framework that leverages pretrained diffusion transformers in their original form, requiring no additional tuning, adapters, or modifications.

Articles Form

Smartphone-based Iris Recognition through High-Quality Visible Spectrum Iris Capture

no code implementations17 Dec 2024 Naveenkumar G Venkataswamy, Yu Liu, Surendra Singh, Soumyabrata Dey, Stephanie Schuckers, Masudul H Imtiaz

However, a thorough study of iris recognition using smartphone-captured 'High-Quality' VIS images and cross-spectral matching with previously enrolled NIR images has not been conducted.

Iris Recognition TAR

IDEA-Bench: How Far are Generative Models from Professional Designing?

1 code implementation CVPR 2025 Chen Liang, Lianghua Huang, Jingwu Fang, Huanzhang Dou, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Junge Zhang, Xin Zhao, Yu Liu

Real-world design tasks - such as picture book creation, film storyboard development using character sets, photo retouching, visual effects, and font transfer - are highly diverse and complex, requiring deep interpretation and extraction of various elements from instructions, descriptions, and reference images.

Large Language Model Multimodal Large Language Model +1

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

no code implementations15 Dec 2024 Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma, Yu Liu, Hongsheng Li

Our approach effectively mitigates key challenges in video face swapping, including temporal flickering, identity preservation, and robustness to occlusions and pose variations.

3D Reconstruction Attribute +3

GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

no code implementations15 Dec 2024 Xutao Liao, Shaohui Li, Yuhui Xu, Zhi Li, Yu Liu, You He

To further enhance performance, we propose sparsely coded residuals to reduce the errors caused by low-rank approximation on the first- and second-order moments of the optimizers and weight updates.

Arithmetic Reasoning Text Generation

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

no code implementations12 Dec 2024 Zhuofan Zong, Dongzhi Jiang, Bingqi Ma, Guanglu Song, Hao Shao, Dazhong Shen, Yu Liu, Hongsheng Li

To effectively exploit consistent visual elements within multiple images, we leverage the multi-image comprehension and instruction-following capabilities of the multimodal large language model (MLLM), prompting it to capture consistent visual elements based on the instruction.

Image Comprehension Image Generation +4

See Further When Clear: Curriculum Consistency Model

no code implementations CVPR 2025 Yunpeng Liu, Boxiao Liu, Yi Zhang, Xingzhong Hou, Guanglu Song, Yu Liu, Haihang You

Specifically, we regard the distillation process at each timestep as a curriculum and introduce a metric based on Peak Signal-to-Noise Ratio (PSNR) to quantify the learning complexity of this curriculum, then ensure that the curriculum maintains consistent learning complexity across different timesteps by having the teacher model iterate more steps when the noise intensity is low.

model

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

no code implementations4 Dec 2024 Ruili Feng, Han Zhang, Zhantao Yang, Jie Xiao, Zhilei Shu, Zhiheng Liu, Andy Zheng, Yukun Huang, Yu Liu, Hongyang Zhang

We present The Matrix, the first foundational realistic world simulator capable of generating continuous 720p high-fidelity real-scene video streams with real-time, responsive control in both first- and third-person perspectives, enabling immersive exploration of richly dynamic environments.

Zero-shot Generalization

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

1 code implementation2 Dec 2024 Jinouwen Zhang, Rongkun Xue, Yazhe Niu, Yun Chen, Jing Yang, Hongsheng Li, Yu Liu

However, existing works exhibit significant variations in training schemes and RL optimization objectives, and some methods are only applicable to diffusion models.

Density Estimation Offline RL +3

Pretrained Reversible Generation as Unsupervised Visual Representation Learning

no code implementations29 Nov 2024 Rongkun Xue, Jinouwen Zhang, Yazhe Niu, Dazhong Shen, Bingqi Ma, Yu Liu, Jing Yang

Recent generative models based on score matching and flow matching have significantly advanced generation tasks, but their potential in discriminative tasks remains underexplored.

Representation Learning

Reward driven workflows for unsupervised explainable analysis of phases and ferroic variants from atomically resolved imaging data

no code implementations19 Nov 2024 Kamyar Barakati, Yu Liu, Chris Nelson, Maxim A. Ziatdinov, Xiaohang Zhang, Ichiro Takeuchi, Sergei V. Kalinin

We demonstrate that a reward-driven approach can be used to optimize these key hyperparameters across the full workflow, where rewards were designed to reflect domain wall continuity and straightness, ensuring that the analysis aligns with the material's physical behavior.

MTFusion: Reconstructing Any 3D Object from Single Image Using Multi-word Textual Inversion

no code implementations19 Nov 2024 Yu Liu, Ruowei Wang, Jiaqi Li, Zixiang Xu, Qijun Zhao

The latest advances for single-image 3D reconstruction extract a textual description from the input image and further utilize it to synthesize 3D models.

3D Reconstruction Attribute +1

Infrared-Assisted Single-Stage Framework for Joint Restoration and Fusion of Visible and Infrared Images under Hazy Conditions

no code implementations16 Nov 2024 Huafeng Li, Jiaqi Fang, Yafei Zhang, Yu Liu

To address this, we propose a joint learning framework that utilizes infrared image for the restoration and fusion of hazy IR-VIS images.

Improved Video VAE for Latent Video Diffusion Model

no code implementations CVPR 2025 Pingyu Wu, Kai Zhu, Yu Liu, Liming Zhao, Wei Zhai, Yang Cao, Zheng-Jun Zha

Specifically, the KTC architecture divides the latent space into two branches, in which one half completely inherits the compression prior of keyframes from a lower-dimension image VAE while the other half involves temporal compression through 3D group causal convolution, reducing temporal-spatial conflicts and accelerating the convergence speed of video VAE.

model Video Reconstruction

Generalizing Hyperedge Expansion for Hyper-relational Knowledge Graph Modeling

no code implementations9 Nov 2024 Yu Liu, Shu Yang, Jingtao Ding, Quanming Yao, Yong Li

To tackle this issue, in this paper, we generalize the hyperedge expansion in hypergraph learning and propose an equivalent transformation for HKG modeling, referred to as TransEQ.

Attribute Decoder

Not Just Object, But State: Compositional Incremental Learning without Forgetting

1 code implementation4 Nov 2024 Yanyi Zhang, Binglin Qiu, Qi Jia, Yu Liu, Ran He

Most incremental learners excessively prioritize coarse classes of objects while neglecting various kinds of states (e. g. color and material) attached to the objects.

Diversity Incremental Learning +2

In-Context LoRA for Diffusion Transformers

1 code implementation31 Oct 2024 Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Huanzhang Dou, Chen Liang, Yutong Feng, Yu Liu, Jingren Zhou

While task-specific in terms of tuning data, our framework remains task-agnostic in architecture and pipeline, offering a powerful tool for the community and providing valuable insights for further research on product-level task-agnostic generation systems.

Image Generation

Synergizing LLM Agents and Knowledge Graph for Socioeconomic Prediction in LBSN

no code implementations29 Oct 2024 Zhilun Zhou, Jingyang Fan, Yu Liu, Fengli Xu, Depeng Jin, Yong Li

Motivated by the remarkable abilities of large language models (LLMs) in commonsense reasoning, embedding, and multi-agent collaboration, in this work, we synergize LLM agents and knowledge graph for socioeconomic prediction.

Graph Representation Learning Prediction

NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking

1 code implementation27 Oct 2024 Yu Liu, Arif Mahmood, Muhammad Haris Khan

To this end, this paper presents NT-VOT211, a new benchmark tailored for evaluating visual object tracking algorithms in the challenging night-time conditions.

Video Object Tracking Visual Object Tracking

STAR-RIS-Enabled Full-Duplex Integrated Sensing and Communication System

no code implementations24 Oct 2024 Yu Liu, Gaojie Chen, Yun Wen, Qu Luo, Chiya Zhang, Dusit Niyato

With the challenging limitations of traditional SIC approaches, this paper proposes a novel simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-enabled FD ISAC system, where STAR-RIS enhances simultaneous communication and target sensing and reduces self-interference (SI) to a level comparable to traditional SIC approaches.

Integrated sensing and communication ISAC

Group Diffusion Transformers are Unsupervised Multitask Learners

no code implementations19 Oct 2024 Lianghua Huang, Wei Wang, Zhi-Fan Wu, Huanzhang Dou, Yupeng Shi, Yutong Feng, Chen Liang, Yu Liu, Jingren Zhou

In this work, we introduce Group Diffusion Transformers (GDTs), a novel framework that unifies diverse visual generation tasks by redefining them as a group generation problem.

Articles Colorization +1

LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment

1 code implementation16 Oct 2024 Juelin Zhu, Shen Yan, Long Wang, Shengyue Zhang, Yu Liu, Maojun Zhang

LoD-Loc mainly achieves this goal by aligning the wireframe derived from the LoD projected model with that predicted by the neural network.

Visual Localization

Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques

1 code implementation15 Oct 2024 Lijie Tao, Haokui Zhang, Haizhao Jing, Yu Liu, Dawei Yan, Guoting Wei, Xizhe Xue

Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in visual language models (VLMs) have pushed this enthusiasm to new heights.

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

1 code implementation11 Oct 2024 Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu

Extensive experiments on multiple datasets demonstrate that SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics.

Autonomous Vehicles motion prediction +3

GLRT-Based Metric Learning for Remote Sensing Object Retrieval

no code implementations8 Oct 2024 Linping Zhang, Yu Liu, Xueqian Wang, Gang Li, You He

We reorganize datasets for CBRSOR tasks based on fine-grained ship remote sensing image slices (FGSRSI-23) and military aircraft recognition (MAR20) datasets.

Clustering Metric Learning +1

Unsupervised Meta-Learning via Dynamic Head and Heterogeneous Task Construction for Few-Shot Classification

1 code implementation3 Oct 2024 Yunchuan Guan, Yu Liu, Ketong Liu, Ke Zhou, Zhiqi Shen

Based on the above conclusion, we argue a promising future for meta-learning in the unsupervised area, and thus propose DHM-UHT, a dynamic head meta-learning algorithm with unsupervised heterogeneous task construction.

Few-Shot Learning

Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

no code implementations2 Oct 2024 Jianxiong Li, Zhihao Wang, Jinliang Zheng, Xiaoai Zhou, Guanming Wang, Guanglu Song, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Junzhi Yu, Xianyuan Zhan

Multimodal task specification is essential for enhanced robotic performance, where \textit{Cross-modality Alignment} enables the robot to holistically understand complex task instructions.

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

no code implementations30 Sep 2024 Zhen Han, Zeyinzi Jiang, Yulin Pan, Jingfeng Zhang, Chaojie Mao, ChenWei Xie, Yu Liu, Jingren Zhou

To comprehensively evaluate the performance of our model, we establish a benchmark of manually annotated pairs data across a variety of visual generation tasks.

All Large Language Model

ControlEdit: A MultiModal Local Clothing Image Editing Method

1 code implementation23 Sep 2024 Di Cheng, Yingjie Shi, ShiXin Sun, JiaFu Zhang, WeiJing Wang, Yu Liu

Multimodal clothing image editing refers to the precise adjustment and modification of clothing images using data such as textual descriptions and visual images as control conditions, which effectively improves the work efficiency of designers and reduces the threshold for user design.

Self-Supervised Learning

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

no code implementations19 Sep 2024 Dongzhi Jiang, Renrui Zhang, Ziyu Guo, Yanmin Wu, Jiayi Lei, Pengshuo Qiu, Pan Lu, Zehui Chen, Chaoyou Fu, Guanglu Song, Peng Gao, Yu Liu, Chunyuan Li, Hongsheng Li

We further present error analysis to unveil current LMMs still struggle to fully grasp the multimodal search tasks, and conduct ablation study to indicate the potential of scaling test-time computation for AI search engine.

Benchmarking

Valuation Model of Chinese Convertible Bonds Based on Monte Carlo Simulation

no code implementations10 Sep 2024 Yu Liu

We tackle the problem of pricing Chinese convertible bonds(CCBs) using Monte Carlo simulation and dynamic programming.

Discovering Cyclists' Visual Preferences Through Shared Bike Trajectories and Street View Images Using Inverse Reinforcement Learning

1 code implementation5 Sep 2024 Kezhou Ren, Meihan Jin, Huiming Liu, Yongxi Gong, Yu Liu

We find that cyclists focus on specific street visual elements when making route decisions, which can be summarized as their attention to safety, street enclosure, and cycling comfort.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI)

OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

no code implementations22 Aug 2024 Guoting Wei, Xia Yuan, Yu Liu, Zhenhao Shang, Kelu Yao, Chao Li, Qingsen Yan, Chunxia Zhao, Haokui Zhang, Rong Xiao

Then, we propose Bidirectional Vision-Language Fusion (Bi-VLF), which includes a dual-attention fusion encoder and a multi-level text-guided Fusion Decoder.

Decoder object-detection +1

SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields

no code implementations13 Aug 2024 Yu Liu, Baoxiong Jia, Yixin Chen, Siyuan Huang

The ability to distill object-centric abstractions from intricate visual scenes underpins human-level generalization.

Novel View Synthesis Object

Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval

no code implementations11 Aug 2024 Rukai Wei, Heng Cui, Yu Liu, Yufeng Hou, Yanzhao Xie, Ke Zhou

Simply applying existing cross-modal approaches to this new task fails to adequately capture latent multi-modal semantics and effectively bridge the modality gap between 2D and 3D.

Contrastive Learning Cross-Modal Retrieval +1

Machine Learning-Based Reward-Driven Tuning of Scanning Probe Microscopy: Towards Fully Automated Microscopy

no code implementations7 Aug 2024 Yu Liu, Roger Proksch, Jason Bemis, Utkarsh Pratiush, Astita Dubey, Mahshid Ahmadi, Reece Emery, Philip D. Rack, Yu-Chen Liu, Jan-Chi Yang, Sergei V. Kalinin

This automated workflow gives optimal scanning parameters for different probes and samples and gives high-quality SPM images consistently in the attractive mode.

Decision Making

Multiscale Representation Enhanced Temporal Flow Fusion Model for Long-Term Workload Forecasting

no code implementations29 Jul 2024 Shiyu Wang, Zhixuan Chu, Yinbo Sun, Yu Liu, Yuliang Guo, Yang Chen, HuiYang Jian, Lintao Ma, Xingyu Lu, Jun Zhou

Despite recent advances with transformer-based forecasting models, challenges remain due to the non-stationary, nonlinear characteristics of workload time series and the long-term dependencies.

Cloud Computing Management +3

PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements

1 code implementation22 Jul 2024 Xueyan Li, Xinyan Chen, Yazhe Niu, Shuai Hu, Yu Liu

To address the challenge of unquantifiable psychological traits, we introduce a novel training paradigm that involves learning the ranking of proxy variables associated with these traits, culminating in a robust score model for MBTI measurements.

Chatbot

Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations

1 code implementation17 Jul 2024 Tomáš Chobola, Yu Liu, Hanyi Zhang, Julia A. Schnabel, Tingying Peng

Current deep learning-based low-light image enhancement methods often struggle with high-resolution images, and fail to meet the practical demands of visual perception across diverse and unseen scenarios.

Low-Light Image Enhancement

MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction

no code implementations16 Jul 2024 Weimin WANG, Yingxu Deng, Zezeng Li, Yu Liu, Na lei

This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection.

QVD: Post-training Quantization for Video Diffusion Models

no code implementations16 Jul 2024 Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

Furthermore, we investigate significant inter-channel disparities and asymmetries in the activation of video diffusion models, resulting in low coverage of quantization levels by individual channels and increasing the challenge of quantization.

Computational Efficiency Quantization

Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher

no code implementations10 Jul 2024 Jiangming Chen, Li Liu, Wanxia Deng, Zhen Liu, Yu Liu, YingMei Wei, Yongxiang Liu

Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain.

object-detection Object Detection +1

Incremental Multiview Point Cloud Registration

no code implementations6 Jul 2024 Xiaoya Cheng, Yu Liu, Maojun Zhang, Shen Yan

This process primarily constructs a coarse multiview registration and refines the model by adjusting the positions of the keypoints on the Track.

3D Reconstruction Point Cloud Registration +1

A Marginal Distributionally Robust Kalman Filter for Centralized Fusion

no code implementations6 Jul 2024 Weizhi Chen, Yaowen Li, Yu Liu, You He

State estimation is a fundamental problem for multi-sensor information fusion, essential in applications such as target tracking, power systems, and control automation.

State Estimation

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

2 code implementations28 Jun 2024 Jihao Liu, Xin Huang, Jinliang Zheng, Boxiao Liu, Jia Wang, Osamu Yoshie, Yu Liu, Hongsheng Li

This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs).

Answer Generation Image Captioning +5

GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting

no code implementations18 Jun 2024 Fan Zhou, Chen Pan, Lintao Ma, Yu Liu, James Zhang, Jun Zhou, Hongyuan Mei, Weitao Lin, Zi Zhuang, Wenxin Ning, Yunhua Hu, Siqiao Xue

These methods merely take the temporal hierarchical structure to maintain coherence without improving the forecasting accuracy.

From Pixels to Progress: Generating Road Network from Satellite Imagery for Socioeconomic Insights in Impoverished Areas

1 code implementation17 Jun 2024 Yanxin Xi, Yu Liu, Zhicheng Liu, Sasu Tarkoma, Pan Hui, Yong Li

The Sustainable Development Goals (SDGs) aim to resolve societal challenges, such as eradicating poverty and improving the lives of vulnerable populations in impoverished areas.

Decoder

UniZero: Generalized and Efficient Planning with Scalable Latent World Models

1 code implementation15 Jun 2024 Yuan Pu, Yazhe Niu, Zhenjie Yang, Jiyuan Ren, Hongsheng Li, Yu Liu

To overcome these limitations, we introduce UniZero, a novel approach that employs a modular transformer-based world model to effectively learn a shared latent space.

Multi-Task Learning Reinforcement Learning (RL)

Compressed Video Quality Enhancement with Temporal Group Alignment and Fusion

no code implementations14 Jun 2024 Qiang Zhu, Yajun Qiu, Yu Liu, Shuyuan Zhu, Bing Zeng

In this paper, we propose a temporal group alignment and fusion network to enhance the quality of compressed videos by using the long-short term correlations between frames.

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

1 code implementation14 Jun 2024 Yan Liu, Yu Liu, Xiaokang Chen, Pin-Yu Chen, Daoguang Zan, Min-Yen Kan, Tsung-Yi Ho

As a result, previous debiasing methods mainly finetune or even pre-train language models on newly constructed anti-stereotypical datasets, which are high-cost.

Fairness Language Modeling +1

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

1 code implementation11 Jun 2024 Yu Liu, Lang Gao, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei Chen

However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking.

Vulnerability Detection

Zero-shot Image Editing with Reference Imitation

1 code implementation11 Jun 2024 Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao

Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like.

Semantic correspondence

Instruction-Guided Visual Masking

1 code implementation30 May 2024 Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan

To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model.

Instruction Following Visual Grounding +1

Enhancing Vision-Language Model with Unmasked Token Alignment

1 code implementation29 May 2024 Jihao Liu, Jinliang Zheng, Boxiao Liu, Yu Liu, Hongsheng Li

Contrastive pre-training on image-text pairs, exemplified by CLIP, becomes a standard technique for learning multi-modal visual-language representations.

Language Modeling Language Modelling +2

Novel Class Discovery for Ultra-Fine-Grained Visual Categorization

1 code implementation CVPR 2024 Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin WANG, Nan Pu

To tackle this problem, we devise a Region-Aligned Proxy Learning (RAPL) framework, which comprises a Channel-wise Region Alignment (CRA) module and a Semi-Supervised Proxy Learning (SemiPL) strategy.

Contrastive Learning Fine-Grained Visual Categorization +3

Batched Stochastic Bandit for Nondegenerate Functions

no code implementations9 May 2024 Yu Liu, Yunlu Shu, Tianyu Wang

More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{{\mathcal{O}}} ( A_{+}^d \sqrt{T} )$.

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

no code implementations1 May 2024 Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li

In this study, we propose Deep Reward Tuning (DRTune), an algorithm that directly supervises the final output image of a text-to-image diffusion model and back-propagates through the iterative sampling process to the input noise.

Denoising

ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

1 code implementation25 Apr 2024 Chunyu Xuan, Yazhe Niu, Yuan Pu, Shuai Hu, Yu Liu, Jing Yang

Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains.

Board Games Decision Making

Improving TAS Adaptability with a Variable Temperature Threshold

no code implementations25 Apr 2024 Anthony Dowling, Ming-Cheng Cheng, Yu Liu

Thermal-Aware Scheduling (TAS) provides methods to manage the thermal dissipation of a computing chip during task execution.

Scheduling

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

1 code implementation19 Apr 2024 Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu

In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts.

Language Modelling Large Language Model

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

no code implementations CVPR 2024 Jihao Liu, Jinliang Zheng, Yu Liu, Hongsheng Li

This paper proposes a GeneraLIst encoder-Decoder (GLID) pre-training method for better handling various downstream computer vision tasks.

Decoder Depth Estimation +6

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

1 code implementation CVPR 2024 Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu

Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space.

Denoising Semantic Segmentation

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

2 code implementations4 Apr 2024 Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li

We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm.

Attribute Image Captioning +3

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

1 code implementation25 Mar 2024 Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions.

Visual Question Answering (VQA)

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

1 code implementation25 Mar 2024 Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo

This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt.

Face Swapping Instruction Following +2

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

1 code implementation20 Mar 2024 Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li

We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-specific patterns of the source video and the image/video generative prior for effective outpainting.

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

1 code implementation19 Mar 2024 Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li

In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions.

Text to Image Generation Text-to-Image Generation

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

1 code implementation CVPR 2024 Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu

Context information, such as road maps and surrounding agents' states, provides crucial geometric and semantic information for motion behavior prediction.

Autonomous Vehicles motion prediction +1

Depth-induced Saliency Comparison Network for Diagnosis of Alzheimer's Disease via Jointly Analysis of Visual Stimuli and Eye Movements

no code implementations15 Mar 2024 Yu Liu, Wenlin Zhang, Shaochu Wang, Fangyu Zuo, Peiguang Jing, Yong Ji

Early diagnosis of Alzheimer's Disease (AD) is very important for following medical treatments, and eye movements under special visual stimuli may serve as a potential non-invasive biomarker for detecting cognitive abnormalities of AD patients.

CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

1 code implementation CVPR 2024 Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu

Specifically, the ITA module aggregates temporal information from consecutive frames and coding priors, while the MNA module globally captures spatial information guided by residual frames.

CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot Learning

no code implementations9 Mar 2024 Yanyi Zhang, Qi Jia, Xin Fan, Yu Liu, Ran He

Inspired by this, we propose a novel A-O disentangled framework for CZSL, namely Class-specified Cascaded Network (CSCNet).

Attribute Compositional Zero-Shot Learning +2

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

1 code implementation28 Feb 2024 Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan

Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding.

Contrastive Learning Decision Making +1

Extensible Multi-Granularity Fusion Network for Aspect-based Sentiment Analysis

1 code implementation12 Feb 2024 Xiaowei Zhao, Yong Zhou, Xiujuan Xu, Yu Liu

This paper presents the Extensible Multi-Granularity Fusion (EMGF) network, which integrates information from dependency and constituent syntactic, attention semantic , and external knowledge graphs.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Estimating On-road Transportation Carbon Emissions from Open Data of Road Network and Origin-destination Flow Data

1 code implementation7 Feb 2024 Jinwei Zeng, Yu Liu, Jingtao Ding, Jian Yuan, Yong Li

To relieve this issue by utilizing the strong pattern recognition of artificial intelligence, we incorporate two sources of open data representative of the transportation demand and capacity factors, the origin-destination (OD) flow data and the road network data, to build a hierarchical heterogeneous graph learning method for on-road carbon emission estimation (HENCE).

Graph Learning

Space Group Constrained Crystal Generation

no code implementations6 Feb 2024 Rui Jiao, Wenbing Huang, Yu Liu, Deli Zhao, Yang Liu

Crystals are the foundation of numerous scientific and industrial applications.

AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data

1 code implementation1 Feb 2024 Fu-Yun Wang, Zhaoyang Huang, Weikang Bian, Xiaoyu Shi, Keqiang Sun, Guanglu Song, Yu Liu, Hongsheng Li

This paper introduces an effective method for computation-efficient personalized style video generation without requiring access to any personalized video data.

Conditional Image Generation Denoising +2

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

no code implementations30 Jan 2024 Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan

To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes.

Vector Graphics

Deep-Learning Channel Estimation for IRS-Assisted Integrated Sensing and Communication System

no code implementations29 Jan 2024 Yu Liu, Ibrahim Al-Nahhal, Octavia A. Dobre, Fanggang Wang

This problem is challenging due to the lack of signal processing capacity in passive IRS, as well as the presence of mutual interference between sensing and communication (SAC) signals in ISAC systems.

Integrated sensing and communication ISAC

Extreme Learning Machine-based Channel Estimation in IRS-Assisted Multi-User ISAC System

no code implementations29 Jan 2024 Yu Liu, Ibrahim Al-Nahhal, Octavia A. Dobre, Fanggang Wang, Hyundong Shin

Multi-user integrated sensing and communication (ISAC) assisted by intelligent reflecting surface (IRS) has been recently investigated to provide a high spectral and energy efficiency transmission.

Efficient Neural Network Integrated sensing and communication +1

UV-SAM: Adapting Segment Anything Model for Urban Village Identification

1 code implementation16 Jan 2024 Xin Zhang, Yu Liu, Yuming Lin, Qingmin Liao, Yong Li

Urban villages, defined as informal residential areas in or around urban centers, are characterized by inadequate infrastructures and poor living conditions, closely related to the Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities.

image-classification Image Classification +1

Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

no code implementations10 Jan 2024 Yu Liu, Yuexin Zhang, Kunming Li, Yongliang Qiao, Stewart Worrall, You-Fu Li, He Kong

To overcome this limitation, this paper proposes a graph transformer structure to improve prediction performance, capturing the differences between the various sites and scenarios contained in the datasets.

Autonomous Vehicles Domain Adaptation +3

EasyDrag: Efficient Point-based Manipulation on Diffusion Models

1 code implementation CVPR 2024 Xingzhong Hou, Boxiao Liu, Yi Zhang, Jihao Liu, Yu Liu, Haihang You

Generative models are gaining increasing popularity and the demand for precisely generating images is on the rise.

Image Manipulation

Multi-agent Collaborative Perception via Motion-aware Robust Communication Network

no code implementations CVPR 2024 Shixin Hong, Yu Liu, Zhi Li, Shaohui Li, You He

Collaborative perception allows for information sharing between multiple agents such as vehicles and infrastructure to obtain a comprehensive view of the environment through communication and fusion.

3D Object Detection object-detection

Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

no code implementations CVPR 2024 Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

To align the generated image with layout instructions we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.

Text to Image Generation Text-to-Image Generation

Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

no code implementations23 Dec 2023 Xianjie Zhang, Jiahao Sun, Chen Gong, Kai Wang, Yifei Cao, Hao Chen, Yu Liu

The emergence of on-demand ride pooling services allows each vehicle to serve multiple passengers at a time, thus increasing drivers' income and enabling passengers to travel at lower prices than taxi/car on-demand services (only one passenger can be assigned to a car at a time like UberX and Lyft).

Reinforcement Learning (RL)

Critic-Guided Decision Transformer for Offline Reinforcement Learning

1 code implementation21 Dec 2023 Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, Yu Qiao

Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Return-Conditioned Supervised Learning (RCSL), a paradigm that learns the action distribution based on target returns for each state in a supervised manner.

D4RL Offline RL +4

Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches

no code implementations20 Dec 2023 Yu Liu, Runzhe Wan, James McQueen, Doug Hains, Jinxiang Gu, Rui Song

The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency.

Decision Making

VideoLCM: Video Latent Consistency Model

2 code implementations14 Dec 2023 Xiang Wang, Shiwei Zhang, Han Zhang, Yu Liu, Yingya Zhang, Changxin Gao, Nong Sang

Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models.

Computational Efficiency Image Generation +2

Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation

no code implementations12 Dec 2023 Shaopeng Zhai, Jie Wang, Tianyi Zhang, Fuxian Huang, Qi Zhang, Ming Zhou, Jing Hou, Yu Qiao, Yu Liu

Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks.

Decision Making Language Modelling +1

A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

1 code implementation12 Dec 2023 Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining.

MuJoCo Offline RL

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

2 code implementations CVPR 2024 Hao Shao, Yuxuan Hu, Letian Wang, Steven L. Waslander, Yu Liu, Hongsheng Li

On the other hand, previous autonomous driving methods tend to rely on limited-format inputs (e. g. sensor data and navigation waypoints), restricting the vehicle's ability to understand language information and interact with humans.

Autonomous Driving Instruction Following

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

no code implementations12 Dec 2023 Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang Fu, Zheng-Jun Zha

Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality.

LivePhoto: Real Image Animation with Text-guided Motion Control

no code implementations5 Dec 2023 Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao

In particular, considering the facts that (1) text can only describe motions roughly (e. g., regardless of the moving speed) and (2) text may include both content and motion descriptions, we introduce a motion intensity estimation module as well as a text re-weighting module to reduce the ambiguity of text-to-motion mapping.

Image Animation Text-to-Video Generation +1

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

1 code implementation CVPR 2024 Yutong Feng, Biao Gong, Di Chen, Yujun Shen, Yu Liu, Jingren Zhou

Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts, especially those with quantity, object-attribute binding, and multi-subject descriptions.

Attribute Denoising +1

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

no code implementations CVPR 2024 Siteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang

Experimental results show that existing subject-driven customization methods fail to learn the representative characteristics of actions and struggle in decoupling actions from context features, including appearance.

Text to Image Generation Text-to-Image Generation

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

no code implementations27 Nov 2023 Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.

Text to Image Generation Text-to-Image Generation

Towards Large-scale Masked Face Recognition

no code implementations25 Oct 2023 Manyuan Zhang, Bingqi Ma, Guanglu Song, Yunxiao Wang, Hongsheng Li, Yu Liu

During the COVID-19 coronavirus epidemic, almost everyone is wearing masks, which poses a huge challenge for deep learning-based face recognition algorithms.

Face Recognition

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

no code implementations ICCV 2023 Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li

We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object.

Classification Decoder +2

Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations

no code implementations13 Oct 2023 Lu Li, Yuxin Pan, RuoBing Chen, Jie Liu, Zilin Wang, Yu Liu, Zhiheng Li

Considering that obtaining expert demonstrations can be costly, the focus of current IRL techniques is on learning a better-than-demonstrator policy using a reward function derived from sub-optimal demonstrations.

Contrastive Learning

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

1 code implementation NeurIPS 2023 Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu

Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari.

Board Games Decision Making +1

Continuous Invariance Learning

no code implementations9 Oct 2023 Yong Lin, Fan Zhou, Lu Tan, Lintao Ma, Jiameng Liu, Yansu He, Yuan Yuan, Yu Liu, James Zhang, Yujiu Yang, Hao Wang

To address this challenge, we then propose Continuous Invariance Learning (CIL), which extracts invariant features across continuously indexed domains.

Cloud Computing

Magicremover: Tuning-free Text-guided Image inpainting with Diffusion Models

no code implementations4 Oct 2023 Siyuan Yang, Lu Zhang, Liqian Ma, Yu Liu, Jingjing Fu, You He

In this paper, we propose MagicRemover, a tuning-free method that leverages the powerful diffusion models for text-guided image inpainting.

Denoising Image Inpainting

Continuous 3D Myocardial Motion Tracking via Echocardiography

no code implementations4 Oct 2023 Chengkang Shen, Hao Zhu, You Zhou, Yu Liu, Si Yi, Lili Dong, Weipeng Zhao, David J. Brady, Xun Cao, Zhan Ma, Yi Lin

Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of cardiovascular diseases (CVDs), the foremost cause of death globally.

Motion Estimation

Regulating CPU Temperature With Thermal-Aware Scheduling Using a Reduced Order Learning Thermal Model

no code implementations2 Oct 2023 Anthony Dowling, Lin Jiang, Ming-Cheng Cheng, Yu Liu

Additionally, we compare the performance of a state of the art TAS algorithm, RT-TAS, to our proposed POD-TAS algorithm.

Scheduling

Liveness Detection Competition -- Noncontact-based Fingerprint Algorithms and Systems (LivDet-2023 Noncontact Fingerprint)

no code implementations1 Oct 2023 Sandip Purnapatra, Humaira Rezaie, Bhavin Jawade, Yu Liu, Yue Pan, Luke Brosell, Mst Rumana Sumi, Lambert Igene, Alden Dimarco, Srirangaraj Setlur, Soumyabrata Dey, Stephanie Schuckers, Marco Huber, Jan Niklas Kolf, Meiling Fang, Naser Damer, Banafsheh Adami, Raul Chitic, Karsten Seelert, Vishesh Mistry, Rahul Parthe, Umit Kacar

The competition serves as an important benchmark in noncontact-based fingerprint PAD, offering (a) independent assessment of the state-of-the-art in noncontact-based fingerprint PAD for algorithms and systems, and (b) common evaluation protocol, which includes finger photos of a variety of Presentation Attack Instruments (PAIs) and live fingers to the biometric research community (c) provides standard algorithm and system evaluation protocols, along with the comparative analysis of state-of-the-art algorithms from academia and industry with both old and new android smartphones.

All

Towards Generative Modeling of Urban Flow through Knowledge-enhanced Denoising Diffusion

1 code implementation19 Sep 2023 Zhilun Zhou, Jingtao Ding, Yu Liu, Depeng Jin, Yong Li

To capture the effect of multiple factors on urban flow, such as region features and urban environment, we employ diffusion model to generate urban flow for regions under different conditions.

Denoising

BigFUSE: Global Context-Aware Image Fusion in Dual-View Light-Sheet Fluorescence Microscopy with Image Formation Prior

no code implementations5 Sep 2023 Yu Liu, Gesine Muller, Nassir Navab, Carsten Marr, Jan Huisken, Tingying Peng

Light-sheet fluorescence microscopy (LSFM), a planar illumination technique that enables high-resolution imaging of samples, experiences defocused image quality caused by light scattering when photons propagate through thick tissues.

Evaluation Mappings of Spatial Accelerator Based On Data Placement

no code implementations4 Sep 2023 Zhipeng Wu, Yu Liu

Based on data placement relations, polyAcc accurately analyzes the data volume for different reuse patterns and estimate metrics, including data reuse, latency, and energy.

Relation Scheduling

Snow Removal for LiDAR Point Clouds with Spatio-temporal Conditional Random Fields

1 code implementation IEEE ROBOTICS AND AUTOMATION LETTERS 2023 Weimin WANG, Ting Yang, Yu Du, Yu Liu

The proposed approach first constructs the CRF based on k-nearest neighbors with the snow confidence derived from the physical priors of snow, such as intensity and distribution.

3D Object Detection Autonomous Driving +2

3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability

1 code implementation ICCV 2023 Ruowei Wang, Yu Liu, Pei Su, Jianwei Zhang, Qijun Zhao

Our method utilizes implicit functions as the 3D shape representation and combines a novel latent-space GAN with a linear subspace model to discover semantic dimensions in the local latent space of 3D shapes.

3D Shape Generation 3D Shape Representation +1

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

no code implementations24 Jul 2023 Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency.

continuous-control Continuous Control +3

A Physics-Informed Data-Driven Fault Location Method for Transmission Lines Using Single-Ended Measurements with Field Data Validation

no code implementations19 Jul 2023 Yiqi Xing, Yu Liu, Dayou Lu, Xinchen Zou, Xuming He

This procedure merges the gap between simulation and practical power systems, and at the same time considers the uncertainty of system and fault parameters in practice.

AnyDoor: Zero-shot Object-level Image Customization

2 code implementations CVPR 2024 Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao

This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.

Object Virtual Try-on

OpenSiteRec: An Open Dataset for Site Recommendation

no code implementations3 Jul 2023 Xinhang Li, Xiangyu Zhao, Yejing Wang, Yu Liu, Yong Li, Cheng Long, Yong Zhang, Chunxiao Xing

As a representative information retrieval task, site recommendation, which aims at predicting the optimal sites for a brand or an institution to open new branches in an automatic data-driven way, is beneficial and crucial for brand development in modern business.

Benchmarking Information Retrieval +1

Lipschitz Singularities in Diffusion Models

no code implementations20 Jun 2023 Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, Fan Cheng

Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models.

Learning Search-Space Specific Heuristics Using Neural Networks

no code implementations6 Jun 2023 Yu Liu, Ryo Kuroiwa, Alex Fukunaga

We propose and evaluate a system which learns a neuralnetwork heuristic function for forward search-based, satisficing classical planning.

regression

Cannot find the paper you are looking for? You can Submit a new open access paper.