no code implementations • EMNLP 2021 • Kangli Zi, Shi Wang, Yu Liu, Jicun Li, Yanan Cao, Cungen Cao
Sentence Compression (SC), which aims to shorten sentences while retaining important words that express the essential meanings, has been studied for many years in many languages, especially in English.
1 code implementation • ECCV 2020 • Yu Liu, Sarah Parisot, Gregory Slabaugh, Xu Jia, Ales Leonardis, Tinne Tuytelaars
Since those regularization strategies are mostly associated with classifier outputs, we propose a MUlti-Classifier (MUC) incremental learning paradigm that integrates an ensemble of auxiliary classifiers to estimate more effective regularization constraints.
no code implementations • 16 Jun 2025 • Yi Wang, Zhenghong Wang, Fan Zhang, Chengling Tang, Chaogui Kang, Di Zhu, Zhongfu Ma, Sijie Ruan, Weiyu Zhang, Yu Zheng, Philip S. Yu, Yu Liu
Specifically, it (1) estimates two spatially explicit mass parameters based on inflow and outflow, (2) models the likelihood of cross-unit interaction using closed-form solutions of spatial interactions to constrain spatial modeling randomness, and (3) utilizes the learned spatial interaction to guide and mitigate the over-smoothing phenomenon in transformer attention matrices.
no code implementations • 9 Jun 2025 • Yu Liu, Utkarsh Pratiush, Kamyar Barakati, Hiroshi Funakubo, Ching-Che Lin, Jaegyu Kim, Lane W. Martin, Sergei V. Kalinin
Ferroelectric polarization switching underpins the functional performance of a wide range of materials and devices, yet its dependence on complex local microstructural features renders systematic exploration by manual or grid-based spectroscopic measurements impractical.
1 code implementation • arXiv 2025 • Pingyu Wu, Kai Zhu, Yu Liu, Longxiang Tang, Jian Yang, Yansong Peng, Wei Zhai, Yang Cao, Zheng-Jun Zha
Autoregressive image generation aims to predict the next token based on previous ones.
Ranked #12 on
Image Generation
on ImageNet 256x256
1 code implementation • 5 Jun 2025 • Pingyu Wu, Kai Zhu, Yu Liu, Longxiang Tang, Jian Yang, Yansong Peng, Wei Zhai, Yang Cao, Zheng-Jun Zha
Autoregressive image generation aims to predict the next token based on previous ones.
no code implementations • 3 Jun 2025 • Linya Fu, Yu Liu, Zhijie Liu, Zedong Yang, Zhong-Qiu Wang, Youfu Li, He Kong
We propose AuralNet, a novel 3D multi-source binaural sound source localization approach that localizes overlapping sources in both azimuth and elevation without prior knowledge of the number of sources.
1 code implementation • 3 Jun 2025 • Jinwei Zeng, Yu Liu, Guozhen Zhang, Jingtao Ding, Yuming Lin, Jian Yuan, Yong Li
Our model, OpenCarbon, features two major designs that target the challenges: a cross-modality information extraction and fusion module to extract complementary functionality information from two modules and model their interactions, and a neighborhood-informed aggregation module to capture the spatial contiguity correlations.
no code implementations • 29 May 2025 • Kamyar Barakati, Yu Liu, Hiroshi Funakubo, Sergei V. Kalinin
Domain-wall dynamics in ferroelectric materials are strongly position-dependent since each polar interface is locked into a unique local microstructure.
1 code implementation • 28 May 2025 • Linghan Zhong, Samuel Yuan, Jiyang Zhang, Yu Liu, Pengyu Nie, Junyi Jessy Li, Milos Gligoric
Exceptional behavior tests (EBTs) are crucial in software development for verifying that code correctly handles unwanted events and throws appropriate exceptions.
no code implementations • 25 May 2025 • Shenggan Cheng, Yuanxin Wei, Lansong Diao, Yong liu, Bujiao Chen, Lianghua Huang, Yu Liu, Wenyuan Yu, Jiangsu Du, Wei Lin, Yang You
Leveraging the diffusion transformer (DiT) architecture, models like Sora, CogVideoX and Wan have achieved remarkable progress in text-to-video, image-to-video, and video editing tasks.
no code implementations • 21 May 2025 • Xin Bai, Guanyi Chen, Tingting He, Chenlian Zhou, Yu Liu
We formally redefine the ESC task to account for this, proposing a revised formulation that requires generating the full sequence of strategy-utterance pairs given a dialogue history.
1 code implementation • 20 May 2025 • Yu Liu, Weiyao Tao, Tong Xia, Simon Knight, Tingting Zhu
To bridge this gap, in this work, we introduce SurvUnc, a novel meta-model based framework for post-hoc uncertainty quantification for survival models.
no code implementations • 20 May 2025 • Xi Chen, Shen Yan, Juelin Zhu, Chen Chen, Yu Liu, Maojun Zhang
Existing methods predominantly rely on domain adaptation and generalization strategies, often utilizing small-scale models that exhibit limited performance.
no code implementations • 20 May 2025 • Junjie Li, Jiawei Wang, Miyu Li, Yu Liu, Yumei Wang, Haitao Xu
Depth estimation plays a great potential role in obstacle avoidance and navigation for further Mars exploration missions.
no code implementations • 16 May 2025 • Zihan Wang, Hongwei Li, Rui Zhang, Yu Liu, Wenbo Jiang, Wenshu Fan, Qingchuan Zhao, Guowen Xu
To achieve MPMA, we first design a Direct Preference Manipulation Attack ($\mathtt{DPMA}$) that achieves significant effectiveness by inserting the manipulative word and phrases into the tool name and description.
no code implementations • 16 May 2025 • Xinran Li, Yu Liu, Xiujuan Xu, Xiaowei Zhao
The automatic diagnosis of chest diseases is a popular and challenging task.
no code implementations • 11 May 2025 • Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, PengFei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng, Weiwei Liu, Wenqian Wang, Xianhan Zeng, Xiao Liu, Xiaobo Qin, Xiaohan Ding, Xiaojun Xiao, Xiaoying Zhang, Xuanwei Zhang, Xuehan Xiong, Yanghua Peng, Yangrui Chen, Yanwei Li, Yanxu Hu, Yi Lin, Yiyuan Hu, Yiyuan Zhang, Youbin Wu, Yu Li, Yudong Liu, Yue Ling, Yujia Qin, Zanbo Wang, Zhiwu He, Aoxue Zhang, Bairen Yi, Bencheng Liao, Can Huang, Can Zhang, Chaorui Deng, Chaoyi Deng, Cheng Lin, Cheng Yuan, Chenggang Li, Chenhui Gou, Chenwei Lou, Chengzhi Wei, Chundian Liu, Chunyuan Li, Deyao Zhu, Donghong Zhong, Feng Li, Feng Zhang, Gang Wu, Guodong Li, Guohong Xiao, Haibin Lin, Haihua Yang, Haoming Wang, Heng Ji, Hongxiang Hao, Hui Shen, Huixia Li, Jiahao Li, Jialong Wu, Jianhua Zhu, Jianpeng Jiao, Jiashi Feng, Jiaze Chen, Jianhui Duan, Jihao Liu, Jin Zeng, Jingqun Tang, Jingyu Sun, Joya Chen, Jun Long, Junda Feng, Junfeng Zhan, Junjie Fang, Junting Lu, Kai Hua, Kai Liu, Kai Shen, Kaiyuan Zhang, Ke Shen, Ke Wang, Keyu Pan, Kun Zhang, Kunchang Li, Lanxin Li, Lei LI, Lei Shi, Li Han, Liang Xiang, Liangqiang Chen, Lin Chen, Lin Li, Lin Yan, Liying Chi, Longxiang Liu, Mengfei Du, Mingxuan Wang, Ningxin Pan, Peibin Chen, Pengfei Chen, Pengfei Wu, Qingqing Yuan, Qingyao Shuai, Qiuyan Tao, Renjie Zheng, Renrui Zhang, Ru Zhang, Rui Wang, Rui Yang, Rui Zhao, Shaoqiang Xu, Shihao Liang, Shipeng Yan, Shu Zhong, Shuaishuai Cao, Shuangzhi Wu, Shufan Liu, Shuhan Chang, Songhua Cai, Tenglong Ao, Tianhao Yang, Tingting Zhang, Wanjun Zhong, Wei Jia, Wei Weng, Weihao Yu, Wenhao Huang, Wenjia Zhu, Wenli Yang, Wenzhi Wang, Xiang Long, XiangRui Yin, Xiao Li, Xiaolei Zhu, Xiaoying Jia, Xijin Zhang, Xin Liu, Xinchen Zhang, Xinyu Yang, Xiongcai Luo, Xiuli Chen, Xuantong Zhong, Xuefeng Xiao, Xujing Li, Yan Wu, Yawei Wen, Yifan Du, Yihao Zhang, Yining Ye, Yonghui Wu, Yu Liu, Yu Yue, Yufeng Zhou, Yufeng Yuan, Yuhang Xu, Yuhong Yang, Yun Zhang, Yunhao Fang, Yuntao Li, Yurui Ren, Yuwen Xiong, Zehua Hong, Zehua Wang, Zewei Sun, Zeyu Wang, Zhao Cai, Zhaoyue Zha, Zhecheng An, Zhehui Zhao, Zhengzhuo Xu, Zhipeng Chen, Zhiyong Wu, Zhuofan Zheng, ZiHao Wang, Zilong Huang, Ziyu Zhu, Zuquan Song
We present Seed1. 5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning.
Ranked #1 on
Video Question Answering
on TVBench
1 code implementation • 8 May 2025 • Yu Liu, Fabricio Oliveira
Existing learning-based methods like Neural Two-Stage Stochastic Programming (Neur2SP) employ neural networks (NNs) as recourse function surrogates but rely on computationally intensive mixed-integer programming (MIP) formulations.
no code implementations • 6 May 2025 • Guoting Wei, Yu Liu, Xia Yuan, Xizhe Xue, Linlin Guo, Yifan Yang, Chunxia Zhao, Zongwen Bai, Haokui Zhang, Rong Xiao
Using this label engine, we expand existing aerial detection datasets with rich textual annotations and construct a novel benchmark dataset, called Multi-instance Open-set Aerial Dataset (MI-OAD), addressing the limitations of current remote sensing grounding data and enabling effective open-set aerial detection.
1 code implementation • 24 Apr 2025 • Kai Cui, Jia Li, Yu Liu, Xuesong Zhang, Zhenzhen Hu, Meng Wang
Besides, it introduces Long- and Short-Term Temporal Contrastive Learning (LS-TCL) to capture emotional synchronization at different temporal resolutions within modalities.
no code implementations • 15 Apr 2025 • Dazhong Shen, Guanglu Song, Yi Zhang, Bingqi Ma, Lujundong Li, Dongzhi Jiang, Zhuofan Zong, Yu Liu
To address this problem, we propose an intuitive but effective fine-tuning framework, called Adversarial Diffusion Tuning (ADT), by stimulating the inference process during optimization and aligning the final outputs with training data by adversarial supervision.
no code implementations • 10 Apr 2025 • Yonghao Tan, Pingcheng Dong, Yongkun Wu, Yu Liu, Xuejiao Liu, Peng Luo, Shih-Yang Liu, Xijie Huang, Dong Zhang, Luhong Liang, Kwang-Ting Cheng
DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress.
no code implementations • 9 Apr 2025 • Yu Liu, Sergei V. Kalinin
Automated experimentation has the potential to revolutionize scientific discovery, but its effectiveness depends on well-defined optimization targets, which are often uncertain or probabilistic in real-world settings.
no code implementations • 2 Apr 2025 • Chaohu Liu, Tianyi Gui, Yu Liu, Linli Xu
In this paper, we proposes AdPO, a novel adversarial defense strategy for LVLMs based on preference optimization.
no code implementations • 28 Mar 2025 • Dailan He, Xiahong Wang, Shulun Wang, Guanglu Song, Bingqi Ma, Hao Shao, Yu Liu, Hongsheng Li
Face swapping aims to seamlessly transfer a source facial identity onto a target while preserving target attributes such as pose and expression.
1 code implementation • 26 Mar 2025 • Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jingren Zhou, Jinkai Wang, Jixuan Chen, Kai Zhu, Kang Zhao, Keyu Yan, Lianghua Huang, Mengyang Feng, Ningyi Zhang, Pandeng Li, Pingyu Wu, Ruihang Chu, Ruili Feng, Shiwei Zhang, Siyang Sun, Tao Fang, Tianxing Wang, Tianyi Gui, Tingyu Weng, Tong Shen, Wei Lin, Wei Wang, Wenmeng Zhou, Wente Wang, Wenting Shen, Wenyuan Yu, Xianzhong Shi, Xiaoming Huang, Xin Xu, Yan Kou, Yangyu Lv, Yifei Li, Yijing Liu, Yiming Wang, Yingya Zhang, Yitong Huang, Yong Li, You Wu, Yu Liu, Yulin Pan, Yun Zheng, Yuntao Hong, Yupeng Shi, Yutong Feng, Zeyinzi Jiang, Zhen Han, Zhi-Fan Wu, Ziyu Liu
Openness: We open-source the entire series of Wan, including source code and all models, with the goal of fostering the growth of the video generation community.
no code implementations • 21 Mar 2025 • Weimin WANG, Yu Du, Ting Yang, Yu Liu
Consequently, a cost function involving extrinsic calibration parameters is formulated based on the spatial overlap of 3D grids and LiDAR points.
no code implementations • 18 Mar 2025 • Yulin Pan, Xiangteng He, Chaojie Mao, Zhen Han, Zeyinzi Jiang, Jingfeng Zhang, Yu Liu
In this paper, we propose ICE-Bench, a unified and comprehensive benchmark designed to rigorously assess image generation models.
no code implementations • 17 Mar 2025 • Yu Liu, Hanbin Jiang, Lei Zhu, Yu Zhang, Yuqi Mao, Jiangxia Cao, Shuchao Pang
In the real world, users always have multiple interests while surfing different services to enrich their daily lives, e. g., watching hot short videos/live streamings.
2 code implementations • 10 Mar 2025 • Zeyinzi Jiang, Zhen Han, Chaojie Mao, Jingfeng Zhang, Yulin Pan, Yu Liu
Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation.
no code implementations • 9 Mar 2025 • Yu Liu, Hao Tang, Haiqi Zhang, Jing Qin, Zechao Li
Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
no code implementations • 6 Mar 2025 • Kanghui Ning, Zijie Pan, Yu Liu, Yushan Jiang, James Y. Zhang, Kashif Rasul, Anderson Schneider, Lintao Ma, Yuriy Nevmyvaka, Dongjin Song
Recently, Large Language Models (LLMs) and Foundation Models (FMs) have become prevalent for time series forecasting tasks.
1 code implementation • 5 Mar 2025 • Nianzu Yang, Pandeng Li, Liming Zhao, Yang Li, Chen-Wei Xie, Yehui Tang, Xudong Lu, Zhihang Liu, Yun Zheng, Yu Liu, Junchi Yan
Trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch, extensive experiments demonstrate that CDT achieves state-of-the-art performance in video reconstruction tasks with just a single-step sampling.
no code implementations • CVPR 2025 • Kun Yang, Yuxiang Liu, Zeyu Cui, Yu Liu, Maojun Zhang, Shen Yan, Qing Wang
Thermal infrared imaging offers the advantage of all-weather capability, enabling non-intrusive measurement of an object's surface temperature.
1 code implementation • 2 Mar 2025 • Guanlue Li, Chenran Jiang, Ziqi Gao, Yu Liu, Chenyang Liu, Jiean Chen, Yong Huang, Jia Li
Effective generation of molecular structures, or new chemical entities, that bind to target proteins is crucial for lead identification and optimization in drug discovery.
1 code implementation • 26 Feb 2025 • Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, Siyuan Huang
Existing methods often fail to effectively integrate information across different object states, limiting the accuracy of part-mesh reconstruction and part dynamics modeling, particularly for complex multi-part articulated objects.
no code implementations • 23 Feb 2025 • Kamyar Barakati, Yu Liu, Utkarsh Pratiush, Boris N. Slautin, Sergei V. Kalinin
They can function as wrappers over classical and DCNN-based methods, making them applicable to both unsupervised and supervised workflows (e. g., classification, regression for structure-property mapping) across imaging and hyperspectral data.
no code implementations • 7 Feb 2025 • Siqi Shen, Yu Liu, Daniel Biggs, Omar Hafez, Jiandong Yu, Wentao Zhang, Bin Cui, Jiulong Shan
To enable the transfer learning between differently configured SGUNETs, we propose a set of mapping functions to align the parameters between the pre-trained model and the target model.
no code implementations • 5 Feb 2025 • Xiaoshuai Hao, Yunfeng Diao, Mengchuan Wei, Yifan Yang, Peng Hao, Rong Yin, HUI ZHANG, Weiming Li, Shu Zhao, Yu Liu
To address these issues, we propose MapFusion, a novel multi-modal Bird's-Eye View (BEV) feature fusion method for map construction.
no code implementations • 3 Feb 2025 • Yunchuan Guan, Yu Liu, Ke Zhou, Zhiqi Shen, Jenq-Neng Hwang, Lei LI
Diffusion-based algorithms have emerged as promising techniques for weight generation.
no code implementations • 29 Jan 2025 • Hao Guo, Han Wang, Di Zhu, Lun Wu, A. Stewart Fotheringham, Yu Liu
However, current geographically weighting approaches are ineffective on graph neural networks, yielding no significant improvement in prediction accuracy.
no code implementations • 25 Jan 2025 • zhizhen li, tianyi zhuo, Yifei Cao, Jizhe Yu, Yu Liu
Video stabilization often struggles with distortion and excessive cropping.
no code implementations • 21 Jan 2025 • Yiyang Wang, Xi Chen, Xiaogang Xu, Sihui Ji, Yu Liu, Yujun Shen, Hengshuang Zhao
In spite of the recent progress, image diffusion models still produce artifacts.
1 code implementation • CVPR 2025 • Jinliang Zheng, Jianxiong Li, Dongxiu Liu, Yinan Zheng, Zhihao Wang, Zhonghong Ou, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan
Training on diverse, internet-scale data is a key factor in the success of recent large foundation models.
no code implementations • 16 Jan 2025 • Zixun Fang, Zhiheng Liu, Kai Zhu, Yu Liu, Ka Leong Cheng, Wei Zhai, Yang Cao, Zheng-Jun Zha
Video colorization aims to transform grayscale videos into vivid color representations while maintaining temporal consistency and structural integrity.
no code implementations • CVPR 2025 • Zhiheng Liu, Ka Leong Cheng, Xi Chen, Jie Xiao, Hao Ouyang, Kai Zhu, Yu Liu, Yujun Shen, Qifeng Chen, Ping Luo
Derived from diffusion models, MangaNinjia specializes in the task of reference-guided line art colorization.
no code implementations • 12 Jan 2025 • Ruizhe Ou, Yuan Hu, Fan Zhang, Jiaxin Chen, Yu Liu
In addition, to address the absence of large-scale datasets for training pixel-level RS MLLMs, we construct the GeoPixInstruct dataset, comprising 65, 463 images and 140, 412 instances, with each instance annotated with text descriptions, bounding boxes, and masks.
1 code implementation • 10 Jan 2025 • Hao Guo, Weiyu Zhang, Junjie Yang, Yuanqiao Hou, Lei Dong, Yu Liu
However, for decades new mathematical formulas to model mobility phenomena have been scarce and usually discovered by analogy to physical processes, such as the gravity model and the radiation model.
no code implementations • 5 Jan 2025 • Chaojie Mao, Jingfeng Zhang, Yulin Pan, Zeyinzi Jiang, Zhen Han, Yu Liu, Jingren Zhou
There are many models in the community based on the post-training of text-to-image foundational models that meet this training paradigm of the first stage.
no code implementations • 2 Jan 2025 • Xiaoshuai Hao, Guanqun Liu, YuTing Zhao, Yuheng Ji, Mengchuan Wei, Haimei Zhao, Lingdong Kong, Rong Yin, Yu Liu
Multi-sensor fusion models play a crucial role in autonomous driving perception, particularly in tasks like 3D object detection and HD map construction.
no code implementations • CVPR 2025 • Junfeng Ni, Yu Liu, Ruijie Lu, Zirui Zhou, Song-Chun Zhu, Yixin Chen, Siyuan Huang
To this end, we propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
no code implementations • 28 Dec 2024 • Di Jin, Xing Liu, Yu Liu, Jia Qing Yap, Andrea Wong, Adriana Crespo, Qi Lin, Zhiyuan Yin, Qiang Yan, Ryan Ye
The rapid development of large language models (LLMs) and large vision models (LVMs) have propelled the evolution of multi-modal AI systems, which have demonstrated the remarkable potential for industrial applications by emulating human-like cognition.
no code implementations • 24 Dec 2024 • Yu Liu, Rohit Pant, Ichiro Takeuchi, R. Jackson Spurling, Jon-Paul Maria, Maxim Ziatdinov, Sergei V. Kalinin
Here we demonstrate the implementation of fully automated SPM to explore the evolution of ferroelectric properties in combinatorial libraries, focusing on Sm-doped BiFeO3 and ZnxMg1-xO systems.
no code implementations • 19 Dec 2024 • Jingwei Bao, Yu Liu, Zeliang Li, Shuyuan Zhu, Siu-Kei Au Yeung
This paper introduces a framework designed to enhance the color quality in the V-PCC compressed point clouds.
1 code implementation • 17 Dec 2024 • Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Chen Liang, Tong Shen, Han Zhang, Huanzhang Dou, Yu Liu, Jingren Zhou
Building upon this foundation, we present ChatDiT, a zero-shot, general-purpose, and interactive visual generation framework that leverages pretrained diffusion transformers in their original form, requiring no additional tuning, adapters, or modifications.
no code implementations • 17 Dec 2024 • Naveenkumar G Venkataswamy, Yu Liu, Surendra Singh, Soumyabrata Dey, Stephanie Schuckers, Masudul H Imtiaz
However, a thorough study of iris recognition using smartphone-captured 'High-Quality' VIS images and cross-spectral matching with previously enrolled NIR images has not been conducted.
no code implementations • CVPR 2025 • Ruijie Lu, Yixin Chen, Junfeng Ni, Baoxiong Jia, Yu Liu, Diwen Wan, Gang Zeng, Siyuan Huang
Repurposing pre-trained diffusion models has been proven to be effective for NVS.
1 code implementation • CVPR 2025 • Chen Liang, Lianghua Huang, Jingwu Fang, Huanzhang Dou, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Junge Zhang, Xin Zhao, Yu Liu
Real-world design tasks - such as picture book creation, film storyboard development using character sets, photo retouching, visual effects, and font transfer - are highly diverse and complex, requiring deep interpretation and extraction of various elements from instructions, descriptions, and reference images.
no code implementations • 15 Dec 2024 • Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma, Yu Liu, Hongsheng Li
Our approach effectively mitigates key challenges in video face swapping, including temporal flickering, identity preservation, and robustness to occlusions and pose variations.
no code implementations • 15 Dec 2024 • Xutao Liao, Shaohui Li, Yuhui Xu, Zhi Li, Yu Liu, You He
To further enhance performance, we propose sparsely coded residuals to reduce the errors caused by low-rank approximation on the first- and second-order moments of the optimizers and weight updates.
no code implementations • 12 Dec 2024 • Zhuofan Zong, Dongzhi Jiang, Bingqi Ma, Guanglu Song, Hao Shao, Dazhong Shen, Yu Liu, Hongsheng Li
To effectively exploit consistent visual elements within multiple images, we leverage the multi-image comprehension and instruction-following capabilities of the multimodal large language model (MLLM), prompting it to capture consistent visual elements based on the instruction.
no code implementations • CVPR 2025 • Yunpeng Liu, Boxiao Liu, Yi Zhang, Xingzhong Hou, Guanglu Song, Yu Liu, Haihang You
Specifically, we regard the distillation process at each timestep as a curriculum and introduce a metric based on Peak Signal-to-Noise Ratio (PSNR) to quantify the learning complexity of this curriculum, then ensure that the curriculum maintains consistent learning complexity across different timesteps by having the teacher model iterate more steps when the noise intensity is low.
no code implementations • 4 Dec 2024 • Ruili Feng, Han Zhang, Zhantao Yang, Jie Xiao, Zhilei Shu, Zhiheng Liu, Andy Zheng, Yukun Huang, Yu Liu, Hongyang Zhang
We present The Matrix, the first foundational realistic world simulator capable of generating continuous 720p high-fidelity real-scene video streams with real-time, responsive control in both first- and third-person perspectives, enabling immersive exploration of richly dynamic environments.
1 code implementation • 2 Dec 2024 • Jinouwen Zhang, Rongkun Xue, Yazhe Niu, Yun Chen, Jing Yang, Hongsheng Li, Yu Liu
However, existing works exhibit significant variations in training schemes and RL optimization objectives, and some methods are only applicable to diffusion models.
no code implementations • 29 Nov 2024 • Rongkun Xue, Jinouwen Zhang, Yazhe Niu, Dazhong Shen, Bingqi Ma, Yu Liu, Jing Yang
Recent generative models based on score matching and flow matching have significantly advanced generation tasks, but their potential in discriminative tasks remains underexplored.
no code implementations • 19 Nov 2024 • Kamyar Barakati, Yu Liu, Chris Nelson, Maxim A. Ziatdinov, Xiaohang Zhang, Ichiro Takeuchi, Sergei V. Kalinin
We demonstrate that a reward-driven approach can be used to optimize these key hyperparameters across the full workflow, where rewards were designed to reflect domain wall continuity and straightness, ensuring that the analysis aligns with the material's physical behavior.
no code implementations • 19 Nov 2024 • Yu Liu, Ruowei Wang, Jiaqi Li, Zixiang Xu, Qijun Zhao
The latest advances for single-image 3D reconstruction extract a textual description from the input image and further utilize it to synthesize 3D models.
no code implementations • 16 Nov 2024 • Huafeng Li, Jiaqi Fang, Yafei Zhang, Yu Liu
To address this, we propose a joint learning framework that utilizes infrared image for the restoration and fusion of hazy IR-VIS images.
no code implementations • 14 Nov 2024 • Zengyi Yang, Yafei Zhang, Huafeng Li, Yu Liu
The primary value of infrared and visible image fusion technology lies in applying the fusion results to downstream tasks.
no code implementations • CVPR 2025 • Pingyu Wu, Kai Zhu, Yu Liu, Liming Zhao, Wei Zhai, Yang Cao, Zheng-Jun Zha
Specifically, the KTC architecture divides the latent space into two branches, in which one half completely inherits the compression prior of keyframes from a lower-dimension image VAE while the other half involves temporal compression through 3D group causal convolution, reducing temporal-spatial conflicts and accelerating the convergence speed of video VAE.
no code implementations • 9 Nov 2024 • Yu Liu, Shu Yang, Jingtao Ding, Quanming Yao, Yong Li
To tackle this issue, in this paper, we generalize the hyperedge expansion in hypergraph learning and propose an equivalent transformation for HKG modeling, referred to as TransEQ.
1 code implementation • 4 Nov 2024 • Yanyi Zhang, Binglin Qiu, Qi Jia, Yu Liu, Ran He
Most incremental learners excessively prioritize coarse classes of objects while neglecting various kinds of states (e. g. color and material) attached to the objects.
1 code implementation • 31 Oct 2024 • Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Huanzhang Dou, Chen Liang, Yutong Feng, Yu Liu, Jingren Zhou
While task-specific in terms of tuning data, our framework remains task-agnostic in architecture and pipeline, offering a powerful tool for the community and providing valuable insights for further research on product-level task-agnostic generation systems.
no code implementations • 29 Oct 2024 • Zhilun Zhou, Jingyang Fan, Yu Liu, Fengli Xu, Depeng Jin, Yong Li
Motivated by the remarkable abilities of large language models (LLMs) in commonsense reasoning, embedding, and multi-agent collaboration, in this work, we synergize LLM agents and knowledge graph for socioeconomic prediction.
1 code implementation • 27 Oct 2024 • Yu Liu, Arif Mahmood, Muhammad Haris Khan
To this end, this paper presents NT-VOT211, a new benchmark tailored for evaluating visual object tracking algorithms in the challenging night-time conditions.
1 code implementation • 27 Oct 2024 • Yu Liu, Arif Mahmood, Muhammad Haris Khan
RGB video object tracking is a fundamental task in computer vision.
no code implementations • 24 Oct 2024 • Yu Liu, Gaojie Chen, Yun Wen, Qu Luo, Chiya Zhang, Dusit Niyato
With the challenging limitations of traditional SIC approaches, this paper proposes a novel simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-enabled FD ISAC system, where STAR-RIS enhances simultaneous communication and target sensing and reduces self-interference (SI) to a level comparable to traditional SIC approaches.
no code implementations • 22 Oct 2024 • Yifan Qin, Zhenge Jia, Zheyu Yan, Jay Mok, Manto Yung, Yu Liu, Xuejiao Liu, Wujie Wen, Luhong Liang, Kwang-Ting Tim Cheng, X. Sharon Hu, Yiyu Shi
This paper proposes an ultra-low power, mixed-bit-width sparse convolutional neural network (CNN) accelerator to accelerate ventricular arrhythmia (VA) detection.
no code implementations • 19 Oct 2024 • Lianghua Huang, Wei Wang, Zhi-Fan Wu, Huanzhang Dou, Yupeng Shi, Yutong Feng, Chen Liang, Yu Liu, Jingren Zhou
In this work, we introduce Group Diffusion Transformers (GDTs), a novel framework that unifies diverse visual generation tasks by redefining them as a group generation problem.
1 code implementation • 16 Oct 2024 • Juelin Zhu, Shen Yan, Long Wang, Shengyue Zhang, Yu Liu, Maojun Zhang
LoD-Loc mainly achieves this goal by aligning the wireframe derived from the LoD projected model with that predicted by the neural network.
1 code implementation • 15 Oct 2024 • Lijie Tao, Haokui Zhang, Haizhao Jing, Yu Liu, Dawei Yan, Guoting Wei, Xizhe Xue
Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in visual language models (VLMs) have pushed this enthusiasm to new heights.
1 code implementation • 11 Oct 2024 • Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
Extensive experiments on multiple datasets demonstrate that SmartPretrain consistently improves the performance of state-of-the-art prediction models across datasets, data splits and main metrics.
no code implementations • 8 Oct 2024 • Linping Zhang, Yu Liu, Xueqian Wang, Gang Li, You He
We reorganize datasets for CBRSOR tasks based on fine-grained ship remote sensing image slices (FGSRSI-23) and military aircraft recognition (MAR20) datasets.
1 code implementation • 3 Oct 2024 • Yunchuan Guan, Yu Liu, Ketong Liu, Ke Zhou, Zhiqi Shen
Based on the above conclusion, we argue a promising future for meta-learning in the unsupervised area, and thus propose DHM-UHT, a dynamic head meta-learning algorithm with unsupervised heterogeneous task construction.
1 code implementation • 3 Oct 2024 • Boris N. Slautin, Yu Liu, Jan Dec, Vladimir V. Shvartsman, Doru C. Lupascu, Maxim Ziatdinov, Sergei V. Kalinin
We have developed a Bayesian optimization (BO) workflow that integrates intra-step noise optimization into automated experimental cycles.
no code implementations • 2 Oct 2024 • Jianxiong Li, Zhihao Wang, Jinliang Zheng, Xiaoai Zhou, Guanming Wang, Guanglu Song, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Junzhi Yu, Xianyuan Zhan
Multimodal task specification is essential for enhanced robotic performance, where \textit{Cross-modality Alignment} enables the robot to holistically understand complex task instructions.
no code implementations • 30 Sep 2024 • Zhen Han, Zeyinzi Jiang, Yulin Pan, Jingfeng Zhang, Chaojie Mao, ChenWei Xie, Yu Liu, Jingren Zhou
To comprehensively evaluate the performance of our model, we establish a benchmark of manually annotated pairs data across a variety of visual generation tasks.
no code implementations • 24 Sep 2024 • Fuxian Huang, Qi Zhang, Shaopeng Zhai, Jie Wang, Tianyi Zhang, Haoran Zhang, Ming Zhou, Yu Liu, Yu Qiao
Then, we deploy contrastive learning to train the CLSP encoder to effectively represent precise state information.
1 code implementation • 23 Sep 2024 • Di Cheng, Yingjie Shi, ShiXin Sun, JiaFu Zhang, WeiJing Wang, Yu Liu
Multimodal clothing image editing refers to the precise adjustment and modification of clothing images using data such as textual descriptions and visual images as control conditions, which effectively improves the work efficiency of designers and reduces the threshold for user design.
no code implementations • 19 Sep 2024 • Dongzhi Jiang, Renrui Zhang, Ziyu Guo, Yanmin Wu, Jiayi Lei, Pengshuo Qiu, Pan Lu, Zehui Chen, Chaoyou Fu, Guanglu Song, Peng Gao, Yu Liu, Chunyuan Li, Hongsheng Li
We further present error analysis to unveil current LMMs still struggle to fully grasp the multimodal search tasks, and conduct ablation study to indicate the potential of scaling test-time computation for AI search engine.
no code implementations • 10 Sep 2024 • Yu Liu
We tackle the problem of pricing Chinese convertible bonds(CCBs) using Monte Carlo simulation and dynamic programming.
1 code implementation • 5 Sep 2024 • Kezhou Ren, Meihan Jin, Huiming Liu, Yongxi Gong, Yu Liu
We find that cyclists focus on specific street visual elements when making route decisions, which can be summarized as their attention to safety, street enclosure, and cycling comfort.
Explainable artificial intelligence
Explainable Artificial Intelligence (XAI)
1 code implementation • 23 Aug 2024 • Tianze Zheng, Ailun Wang, Xu Han, Yu Xia, Xingyuan Xu, Jiawei Zhan, Yu Liu, Yang Chen, Zhi Wang, Xiaojie Wu, Sheng Gong, Wen Yan
A force field is a critical component in molecular dynamics simulations for computational drug discovery.
no code implementations • 22 Aug 2024 • Guoting Wei, Xia Yuan, Yu Liu, Zhenhao Shang, Kelu Yao, Chao Li, Qingsen Yan, Chunxia Zhao, Haokui Zhang, Rong Xiao
Then, we propose Bidirectional Vision-Language Fusion (Bi-VLF), which includes a dual-attention fusion encoder and a multi-level text-guided Fusion Decoder.
no code implementations • 13 Aug 2024 • Yu Liu, Baoxiong Jia, Yixin Chen, Siyuan Huang
The ability to distill object-centric abstractions from intricate visual scenes underpins human-level generalization.
1 code implementation • 12 Aug 2024 • Yixin Guo, Yu Liu, Jianghao Li, Weimin WANG, Qi Jia
Then, we extract realistic features of seen samples and mix them with synthetic features together, allowing the model to train seen and unseen classes jointly.
Human-Object Interaction Detection
Zero-Shot Human-Object Interaction Detection
no code implementations • 11 Aug 2024 • Rukai Wei, Heng Cui, Yu Liu, Yufeng Hou, Yanzhao Xie, Ke Zhou
Simply applying existing cross-modal approaches to this new task fails to adequately capture latent multi-modal semantics and effectively bridge the modality gap between 2D and 3D.
no code implementations • 7 Aug 2024 • Yu Liu, Roger Proksch, Jason Bemis, Utkarsh Pratiush, Astita Dubey, Mahshid Ahmadi, Reece Emery, Philip D. Rack, Yu-Chen Liu, Jan-Chi Yang, Sergei V. Kalinin
This automated workflow gives optimal scanning parameters for different probes and samples and gives high-quality SPM images consistently in the attractive mode.
no code implementations • 29 Jul 2024 • Shiyu Wang, Zhixuan Chu, Yinbo Sun, Yu Liu, Yuliang Guo, Yang Chen, HuiYang Jian, Lintao Ma, Xingyu Lu, Jun Zhou
Despite recent advances with transformer-based forecasting models, challenges remain due to the non-stationary, nonlinear characteristics of workload time series and the long-term dependencies.
1 code implementation • 22 Jul 2024 • Xueyan Li, Xinyan Chen, Yazhe Niu, Shuai Hu, Yu Liu
To address the challenge of unquantifiable psychological traits, we introduce a novel training paradigm that involves learning the ranking of proxy variables associated with these traits, culminating in a robust score model for MBTI measurements.
1 code implementation • 17 Jul 2024 • Tomáš Chobola, Yu Liu, Hanyi Zhang, Julia A. Schnabel, Tingying Peng
Current deep learning-based low-light image enhancement methods often struggle with high-resolution images, and fail to meet the practical demands of visual perception across diverse and unseen scenarios.
no code implementations • 16 Jul 2024 • Weimin WANG, Yingxu Deng, Zezeng Li, Yu Liu, Na lei
This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection.
no code implementations • 16 Jul 2024 • Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie
Furthermore, we investigate significant inter-channel disparities and asymmetries in the activation of video diffusion models, resulting in low coverage of quantization levels by individual channels and increasing the challenge of quantization.
no code implementations • 10 Jul 2024 • Jiangming Chen, Li Liu, Wanxia Deng, Zhen Liu, Yu Liu, YingMei Wei, Yongxiang Liu
Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain.
no code implementations • 6 Jul 2024 • Xiaoya Cheng, Yu Liu, Maojun Zhang, Shen Yan
This process primarily constructs a coarse multiview registration and refines the model by adjusting the positions of the keypoints on the Track.
no code implementations • 6 Jul 2024 • Weizhi Chen, Yaowen Li, Yu Liu, You He
State estimation is a fundamental problem for multi-sensor information fusion, essential in applications such as target tracking, power systems, and control automation.
no code implementations • CVPR 2025 • Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng
Advancements in large Vision-Language Models have brought precise, accurate image captioning, vital for advancing multi-modal image understanding and processing.
2 code implementations • 28 Jun 2024 • Jihao Liu, Xin Huang, Jinliang Zheng, Boxiao Liu, Jia Wang, Osamu Yoshie, Yu Liu, Hongsheng Li
This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs).
Ranked #139 on
Visual Question Answering
on MM-Vet
no code implementations • 19 Jun 2024 • Songyang Chen, Yu Liu, Lei Zou, Zexuan Wang, Youfang Lin
We investigate the model expressiveness from two aspects.
no code implementations • 18 Jun 2024 • Fan Zhou, Chen Pan, Lintao Ma, Yu Liu, James Zhang, Jun Zhou, Hongyuan Mei, Weitao Lin, Zi Zhuang, Wenxin Ning, Yunhua Hu, Siqiao Xue
These methods merely take the temporal hierarchical structure to maintain coherence without improving the forecasting accuracy.
1 code implementation • 17 Jun 2024 • Yanxin Xi, Yu Liu, Zhicheng Liu, Sasu Tarkoma, Pan Hui, Yong Li
The Sustainable Development Goals (SDGs) aim to resolve societal challenges, such as eradicating poverty and improving the lives of vulnerable populations in impoverished areas.
no code implementations • 17 Jun 2024 • Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu
To deal with this issue, we propose a novel framework to fully harness the capabilities of LLMs.
no code implementations • 15 Jun 2024 • Ying Fu, Yu Li, ShaoDi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu, Yunkang Zhang, Siyuan Jiang, Xiaoqiang Lu, Licheng Jiao, Fang Liu, Xu Liu, Lingling Li, Wenping Ma, Shuyuan Yang, Haiyang Xie, Jian Zhao, Shihua Huang, Peng Cheng, Xi Shen, Zheng Wang, Shuai An, Caizhi Zhu, Xuelong Li, Tao Zhang, Liang Li, Yu Liu, Chenggang Yan, Gengchen Zhang, Linyan Jiang, Bingyi Song, Zhuoyu An, Haibo Lei, Qing Luo, Jie Song, YuAn Liu, Haoyuan Zhang, Lingfeng Wang, Wei Chen, Aling Luo, Cheng Li, Jun Cao, Shu Chen, Zifei Dou, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Xuejian Gou, Qinliang Wang, Yang Liu, Shizhan Zhao, Yanzhao Zhang, Libo Yan, Yuwei Guo, Guoxin Li, Qiong Gao, Chenyue Che, Long Sun, Xiang Chen, Hao Li, Jinshan Pan, Chuanlong Xie, Hongming Chen, Mingrui Li, Tianchen Deng, Jingwei Huang, Yufeng Li, Fei Wan, Bingxin Xu, Jian Cheng, Hongzhe Liu, Cheng Xu, Yuxiang Zou, Weiguo Pan, Songyin Dai, Sen Jia, Junpei Zhang, Puhua Chen, Qihang Li
The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies.
1 code implementation • 15 Jun 2024 • Yuan Pu, Yazhe Niu, Zhenjie Yang, Jiyuan Ren, Hongsheng Li, Yu Liu
To overcome these limitations, we introduce UniZero, a novel approach that employs a modular transformer-based world model to effectively learn a shared latent space.
no code implementations • 14 Jun 2024 • Qiang Zhu, Yajun Qiu, Yu Liu, Shuyuan Zhu, Bing Zeng
In this paper, we propose a temporal group alignment and fusion network to enhance the quality of compressed videos by using the long-short term correlations between frames.
1 code implementation • 14 Jun 2024 • Yan Liu, Yu Liu, Xiaokang Chen, Pin-Yu Chen, Daoguang Zan, Min-Yen Kan, Tsung-Yi Ho
As a result, previous debiasing methods mainly finetune or even pre-train language models on newly constructed anti-stereotypical datasets, which are high-cost.
1 code implementation • 11 Jun 2024 • Yu Liu, Lang Gao, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei Chen
However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking.
1 code implementation • 11 Jun 2024 • Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao
Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like.
no code implementations • 30 May 2024 • Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Jingyuan Wang, Zhengchao Chen, Hao Lu, Jiayi Li, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun, Yong Zhang, Longbiao Chen, Xiaoping Du, Xiang Li, Xueying Zhang, Kun Qin, Zhaoya Gong, Weihua Dong, Xiaofeng Meng
This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models.
1 code implementation • 30 May 2024 • Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan
To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model.
Ranked #1 on
Visual Question Answering
on V*bench
1 code implementation • 29 May 2024 • Jihao Liu, Jinliang Zheng, Boxiao Liu, Yu Liu, Hongsheng Li
Contrastive pre-training on image-text pairs, exemplified by CLIP, becomes a standard technique for learning multi-modal visual-language representations.
1 code implementation • 28 May 2024 • Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Xiaogang Wang, Hongsheng Li
In this paper, we identify three key flaws in the current design of Latent Consistency Models (LCMs).
no code implementations • 20 May 2024 • Yu Liu, Utkarsh Pratiush, Jason Bemis, Roger Proksch, Reece Emery, Philip D. Rack, Yu-Chen Liu, Jan-Chi Yang, Stanislav Udovenko, Susan Trolier-McKinstry, Sergei V. Kalinin
The rapid development of computation power and machine learning algorithms has paved the way for automating scientific discovery with a scanning probe microscope (SPM).
1 code implementation • CVPR 2024 • Yu Liu, Yaqi Cai, Qi Jia, Binglin Qiu, Weimin WANG, Nan Pu
To tackle this problem, we devise a Region-Aligned Proxy Learning (RAPL) framework, which comprises a Channel-wise Region Alignment (CRA) module and a Semi-Supervised Proxy Learning (SemiPL) strategy.
no code implementations • 9 May 2024 • Yu Liu, Yunlu Shu, Tianyu Wang
More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{{\mathcal{O}}} ( A_{+}^d \sqrt{T} )$.
no code implementations • 1 May 2024 • Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li
In this study, we propose Deep Reward Tuning (DRTune), an algorithm that directly supervises the final output image of a text-to-image diffusion model and back-propagates through the iterative sampling process to the input noise.
1 code implementation • 25 Apr 2024 • Chunyu Xuan, Yazhe Niu, Yuan Pu, Shuai Hu, Yu Liu, Jing Yang
Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains.
no code implementations • 25 Apr 2024 • Anthony Dowling, Ming-Cheng Cheng, Yu Liu
Thermal-Aware Scheduling (TAS) provides methods to manage the thermal dissipation of a computing chip during task execution.
1 code implementation • 19 Apr 2024 • Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu
In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts.
no code implementations • 17 Apr 2024 • Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao
3D Gaussians have recently emerged as an efficient representation for novel view synthesis.
no code implementations • CVPR 2024 • Jihao Liu, Jinliang Zheng, Yu Liu, Hongsheng Li
This paper proposes a GeneraLIst encoder-Decoder (GLID) pre-training method for better handling various downstream computer vision tasks.
1 code implementation • CVPR 2024 • Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu
Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space.
2 code implementations • 4 Apr 2024 • Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li
We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm.
1 code implementation • 28 Mar 2024 • Pingcheng Dong, Yonghao Tan, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting Cheng
Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs.
1 code implementation • 25 Mar 2024 • Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li
To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions.
1 code implementation • 25 Mar 2024 • Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo
This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt.
1 code implementation • 20 Mar 2024 • Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li
We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-specific patterns of the source video and the image/video generative prior for effective outpainting.
1 code implementation • 19 Mar 2024 • Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li
In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions.
1 code implementation • CVPR 2024 • Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
Context information, such as road maps and surrounding agents' states, provides crucial geometric and semantic information for motion behavior prediction.
no code implementations • 15 Mar 2024 • Yu Liu, Wenlin Zhang, Shaochu Wang, Fangyu Zuo, Peiguang Jing, Yong Ji
Early diagnosis of Alzheimer's Disease (AD) is very important for following medical treatments, and eye movements under special visual stimuli may serve as a potential non-invasive biomarker for detecting cognitive abnormalities of AD patients.
1 code implementation • CVPR 2024 • Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu
Specifically, the ITA module aggregates temporal information from consecutive frames and coding priors, while the MNA module globally captures spatial information guided by residual frames.
no code implementations • 9 Mar 2024 • Yanyi Zhang, Qi Jia, Xin Fan, Yu Liu, Ran He
Inspired by this, we propose a novel A-O disentangled framework for CZSL, namely Class-specified Cascaded Network (CSCNet).
1 code implementation • 28 Feb 2024 • Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan
Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding.
Ranked #1 on
Contrastive Learning
on 10,000 People - Human Pose Recognition Data
(using extra training data)
no code implementations • 15 Feb 2024 • Yu Liu, Zibo Wang, Yifei Zhu, Chen Chen
We also theoretically prove the existence of a fairness-efficiency tradeoff in privacy budgeting.
1 code implementation • 12 Feb 2024 • Xiaowei Zhao, Yong Zhou, Xiujuan Xu, Yu Liu
This paper presents the Extensible Multi-Granularity Fusion (EMGF) network, which integrates information from dependency and constituent syntactic, attention semantic , and external knowledge graphs.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+2
1 code implementation • 7 Feb 2024 • Jinwei Zeng, Yu Liu, Jingtao Ding, Jian Yuan, Yong Li
To relieve this issue by utilizing the strong pattern recognition of artificial intelligence, we incorporate two sources of open data representative of the transportation demand and capacity factors, the origin-destination (OD) flow data and the road network data, to build a hierarchical heterogeneous graph learning method for on-road carbon emission estimation (HENCE).
no code implementations • 6 Feb 2024 • Rui Jiao, Wenbing Huang, Yu Liu, Deli Zhao, Yang Liu
Crystals are the foundation of numerous scientific and industrial applications.
1 code implementation • 1 Feb 2024 • Fu-Yun Wang, Zhaoyang Huang, Weikang Bian, Xiaoyu Shi, Keqiang Sun, Guanglu Song, Yu Liu, Hongsheng Li
This paper introduces an effective method for computation-efficient personalized style video generation without requiring access to any personalized video data.
no code implementations • 30 Jan 2024 • Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan
To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes.
no code implementations • 29 Jan 2024 • Yu Liu, Ibrahim Al-Nahhal, Octavia A. Dobre, Fanggang Wang
This problem is challenging due to the lack of signal processing capacity in passive IRS, as well as the presence of mutual interference between sensing and communication (SAC) signals in ISAC systems.
no code implementations • 29 Jan 2024 • Yu Liu, Ibrahim Al-Nahhal, Octavia A. Dobre, Fanggang Wang
A deep-learning framework is proposed to estimate the sensing and communication (S&C) channels in such a system.
no code implementations • 29 Jan 2024 • Yu Liu, Ibrahim Al-Nahhal, Octavia A. Dobre, Fanggang Wang, Hyundong Shin
Multi-user integrated sensing and communication (ISAC) assisted by intelligent reflecting surface (IRS) has been recently investigated to provide a high spectral and energy efficiency transmission.
Efficient Neural Network
Integrated sensing and communication
+1
1 code implementation • 16 Jan 2024 • Xin Zhang, Yu Liu, Yuming Lin, Qingmin Liao, Yong Li
Urban villages, defined as informal residential areas in or around urban centers, are characterized by inadequate infrastructures and poor living conditions, closely related to the Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities.
no code implementations • 10 Jan 2024 • Yu Liu, Yuexin Zhang, Kunming Li, Yongliang Qiao, Stewart Worrall, You-Fu Li, He Kong
To overcome this limitation, this paper proposes a graph transformer structure to improve prediction performance, capturing the differences between the various sites and scenarios contained in the datasets.
1 code implementation • CVPR 2024 • Xingzhong Hou, Boxiao Liu, Yi Zhang, Jihao Liu, Yu Liu, Haihang You
Generative models are gaining increasing popularity and the demand for precisely generating images is on the rise.
no code implementations • CVPR 2024 • Shixin Hong, Yu Liu, Zhi Li, Shaohui Li, You He
Collaborative perception allows for information sharing between multiple agents such as vehicles and infrastructure to obtain a comprehensive view of the environment through communication and fusion.
no code implementations • CVPR 2024 • Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu
To align the generated image with layout instructions we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.
1 code implementation • 31 Dec 2023 • Run Shao, Cheng Yang, Qiujun Li, Qing Zhu, Yongjun Zhang, Yansheng Li, Yu Liu, Yong Tang, Dapeng Liu, Shizhong Yang, Haifeng Li
To maintain modality autonomy, AllSpark uses modality-specific encoders to extract the tokens of various spatio-temporal modalities.
no code implementations • 23 Dec 2023 • Xianjie Zhang, Jiahao Sun, Chen Gong, Kai Wang, Yifei Cao, Hao Chen, Yu Liu
The emergence of on-demand ride pooling services allows each vehicle to serve multiple passengers at a time, thus increasing drivers' income and enabling passengers to travel at lower prices than taxi/car on-demand services (only one passenger can be assigned to a car at a time like UberX and Lyft).
1 code implementation • 21 Dec 2023 • Qinying Liu, Wei Wu, Kecheng Zheng, Zhan Tong, Jiawei Liu, Yu Liu, Wei Chen, Zilei Wang, Yujun Shen
The crux of learning vision-language models is to extract semantically aligned information from visual and linguistic data.
1 code implementation • 21 Dec 2023 • Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, Yu Qiao
Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Return-Conditioned Supervised Learning (RCSL), a paradigm that learns the action distribution based on target returns for each state in a supervised manner.
no code implementations • 21 Dec 2023 • Peng Zhao, Jiehua Zhang, Bowen Peng, Longguang Wang, YingMei Wei, Yu Liu, Li Liu
2) BNNs consistently exhibit better adversarial robustness under black-box attacks.
no code implementations • 20 Dec 2023 • Yu Liu, Runzhe Wan, James McQueen, Doug Hains, Jinxiang Gu, Rui Song
The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency.
2 code implementations • 14 Dec 2023 • Xiang Wang, Shiwei Zhang, Han Zhang, Yu Liu, Yingya Zhang, Changxin Gao, Nong Sang
Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models.
no code implementations • 12 Dec 2023 • Shaopeng Zhai, Jie Wang, Tianyi Zhang, Fuxian Huang, Qi Zhang, Ming Zhou, Jing Hou, Yu Qiao, Yu Liu
Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks.
1 code implementation • 12 Dec 2023 • Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang
In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining.
2 code implementations • CVPR 2024 • Hao Shao, Yuxuan Hu, Letian Wang, Steven L. Waslander, Yu Liu, Hongsheng Li
On the other hand, previous autonomous driving methods tend to rely on limited-format inputs (e. g. sensor data and navigation waypoints), restricting the vehicle's ability to understand language information and interact with humans.
no code implementations • 12 Dec 2023 • Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang Fu, Zheng-Jun Zha
Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality.
1 code implementation • CVPR 2024 • Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan
In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.
no code implementations • 5 Dec 2023 • Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao
In particular, considering the facts that (1) text can only describe motions roughly (e. g., regardless of the moving speed) and (2) text may include both content and motion descriptions, we introduce a motion intensity estimation module as well as a text re-weighting module to reduce the ambiguity of text-to-motion mapping.
1 code implementation • CVPR 2024 • Yutong Feng, Biao Gong, Di Chen, Yujun Shen, Yu Liu, Jingren Zhou
Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts, especially those with quantity, object-attribute binding, and multi-subject descriptions.
no code implementations • CVPR 2024 • Siteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang
Experimental results show that existing subject-driven customization methods fail to learn the representative characteristics of actions and struggle in decoupling actions from context features, including appearance.
no code implementations • 27 Nov 2023 • Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu
To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.
1 code implementation • 9 Nov 2023 • Zhenyu Han, Yanxin Xi, Tong Xia, Yu Liu, Yong Li
Built environment supports all the daily activities and shapes our health.
no code implementations • 29 Oct 2023 • Rukai Wei, Yu Liu, Jingkuan Song, Heng Cui, Yanzhao Xie, Ke Zhou
Compressing videos into binary codes can improve retrieval speed and reduce storage overhead.
no code implementations • 25 Oct 2023 • Manyuan Zhang, Bingqi Ma, Guanglu Song, Yunxiao Wang, Hongsheng Li, Yu Liu
During the COVID-19 coronavirus epidemic, almost everyone is wearing masks, which poses a huge challenge for deep learning-based face recognition algorithms.
no code implementations • ICCV 2023 • Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object.
no code implementations • 18 Oct 2023 • Jie Liu, Yinmin Zhang, Chuming Li, Chao Yang, Yaodong Yang, Yu Liu, Wanli Ouyang
Building a single generalist agent with strong zero-shot capability has recently sparked significant advancements.
no code implementations • 13 Oct 2023 • Lu Li, Yuxin Pan, RuoBing Chen, Jie Liu, Zilin Wang, Yu Liu, Zhiheng Li
Considering that obtaining expert demonstrations can be costly, the focus of current IRL techniques is on learning a better-than-demonstrator policy using a reward function derived from sub-optimal demonstrations.
1 code implementation • NeurIPS 2023 • Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari.
no code implementations • 9 Oct 2023 • Yong Lin, Fan Zhou, Lu Tan, Lintao Ma, Jiameng Liu, Yansu He, Yuan Yuan, Yu Liu, James Zhang, Yujiu Yang, Hao Wang
To address this challenge, we then propose Continuous Invariance Learning (CIL), which extracts invariant features across continuously indexed domains.
no code implementations • ICCV 2023 • Shiyue Cao, Yueqin Yin, Lianghua Huang, Yu Liu, Xin Zhao, Deli Zhao, Kaiqi Huang
Vector-quantized image modeling has shown great potential in synthesizing high-quality images.
no code implementations • 4 Oct 2023 • Siyuan Yang, Lu Zhang, Liqian Ma, Yu Liu, Jingjing Fu, You He
In this paper, we propose MagicRemover, a tuning-free method that leverages the powerful diffusion models for text-guided image inpainting.
no code implementations • 4 Oct 2023 • Chengkang Shen, Hao Zhu, You Zhou, Yu Liu, Si Yi, Lili Dong, Weipeng Zhao, David J. Brady, Xun Cao, Zhan Ma, Yi Lin
Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of cardiovascular diseases (CVDs), the foremost cause of death globally.
no code implementations • 2 Oct 2023 • Anthony Dowling, Lin Jiang, Ming-Cheng Cheng, Yu Liu
Additionally, we compare the performance of a state of the art TAS algorithm, RT-TAS, to our proposed POD-TAS algorithm.
no code implementations • 1 Oct 2023 • Sandip Purnapatra, Humaira Rezaie, Bhavin Jawade, Yu Liu, Yue Pan, Luke Brosell, Mst Rumana Sumi, Lambert Igene, Alden Dimarco, Srirangaraj Setlur, Soumyabrata Dey, Stephanie Schuckers, Marco Huber, Jan Niklas Kolf, Meiling Fang, Naser Damer, Banafsheh Adami, Raul Chitic, Karsten Seelert, Vishesh Mistry, Rahul Parthe, Umit Kacar
The competition serves as an important benchmark in noncontact-based fingerprint PAD, offering (a) independent assessment of the state-of-the-art in noncontact-based fingerprint PAD for algorithms and systems, and (b) common evaluation protocol, which includes finger photos of a variety of Presentation Attack Instruments (PAIs) and live fingers to the biometric research community (c) provides standard algorithm and system evaluation protocols, along with the comparative analysis of state-of-the-art algorithms from academia and industry with both old and new android smartphones.
1 code implementation • 19 Sep 2023 • Zhilun Zhou, Jingtao Ding, Yu Liu, Depeng Jin, Yong Li
To capture the effect of multiple factors on urban flow, such as region features and urban environment, we employ diffusion model to generate urban flow for regions under different conditions.
no code implementations • 5 Sep 2023 • Yu Liu, Gesine Muller, Nassir Navab, Carsten Marr, Jan Huisken, Tingying Peng
Light-sheet fluorescence microscopy (LSFM), a planar illumination technique that enables high-resolution imaging of samples, experiences defocused image quality caused by light scattering when photons propagate through thick tissues.
no code implementations • 4 Sep 2023 • Zhipeng Wu, Yu Liu
Based on data placement relations, polyAcc accurately analyzes the data volume for different reuse patterns and estimate metrics, including data reuse, latency, and energy.
1 code implementation • IEEE ROBOTICS AND AUTOMATION LETTERS 2023 • Weimin WANG, Ting Yang, Yu Du, Yu Liu
The proposed approach first constructs the CRF based on k-nearest neighbors with the snow confidence derived from the physical priors of snow, such as intensity and distribution.
1 code implementation • 1 Aug 2023 • Yanxin Xi, Yu Liu, Tong Li, Jintao Ding, Yunke Zhang, Sasu Tarkoma, Yong Li, Pan Hui
Especially satellite imagery is a potential data source for studying sustainable urban development.
1 code implementation • ICCV 2023 • Ruowei Wang, Yu Liu, Pei Su, Jianwei Zhang, Qijun Zhao
Our method utilizes implicit functions as the 3D shape representation and combines a novel latent-space GAN with a linear subspace model to discover semantic dimensions in the local latent space of 3D shapes.
no code implementations • 24 Jul 2023 • Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency.
no code implementations • 19 Jul 2023 • Yiqi Xing, Yu Liu, Dayou Lu, Xinchen Zou, Xuming He
This procedure merges the gap between simulation and practical power systems, and at the same time considers the uncertainty of system and fault parameters in practice.
2 code implementations • CVPR 2024 • Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.
3 code implementations • 7 Jul 2023 • Shilong Zhang, Peize Sun, Shoufa Chen, Min Xiao, Wenqi Shao, Wenwei Zhang, Yu Liu, Kai Chen, Ping Luo
Before sending to LLM, the reference is replaced by RoI features and interleaved with language embeddings as a sequence.
Ranked #1 on
Visual Question Answering (VQA)
on VCR (Q-AR) test
no code implementations • 3 Jul 2023 • Xinhang Li, Xiangyu Zhao, Yejing Wang, Yu Liu, Yong Li, Cheng Long, Yong Zhang, Chunxiao Xing
As a representative information retrieval task, site recommendation, which aims at predicting the optimal sites for a brand or an institution to open new branches in an automatic data-driven way, is beneficial and crucial for brand development in modern business.
no code implementations • 20 Jun 2023 • Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, Fan Cheng
Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models.
no code implementations • 6 Jun 2023 • Yu Liu, Ryo Kuroiwa, Alex Fukunaga
We propose and evaluate a system which learns a neuralnetwork heuristic function for forward search-based, satisficing classical planning.