1 code implementation • 17 Mar 2025 • Dingkang Liang, Dingyuan Zhang, Xin Zhou, Sifan Tu, Tianrui Feng, Xiaofan Li, Yumeng Zhang, Mingyang Du, Xiao Tan, Xiang Bai
Extensive experiments on the nuScenes dataset demonstrate that UniFuture outperforms specialized models on future generation and perception tasks, highlighting the advantages of a unified, structurally-aware world model.
no code implementations • 10 Mar 2025 • Zhuowen Zheng, Yain-Whar Si, Xiaochen Yuan, Junwei Duan, Ke Wang, Xiaofan Li, Xinyuan Zhang, Xueyuan Gong
Nowadays, with the advancement of deep neural networks (DNNs) and the availability of large-scale datasets, the face recognition (FR) model has achieved exceptional performance.
no code implementations • 8 Mar 2025 • Si Zhou, Yain-Whar Si, Xiaochen Yuan, Xiaofan Li, Xiaoxiang Liu, Xinyuan Zhang, Cong Lin, Xueyuan Gong
In Neural Networks, there are various methods of feature fusion.
no code implementations • 5 Mar 2025 • Jinhui Zheng, Zhiquan Liu, Yain-Whar Si, Jianqing Li, Xinyuan Zhang, Xiaofan Li, HaoZhi Huang, Xueyuan Gong
One of the most advanced models for this task is Vertical Attention Network (VAN), which utilizes a Vertical Attention Module (VAM) to implicitly segment paragraph text images into text lines, thereby reducing the difficulty of the recognition task.
1 code implementation • 5 Mar 2025 • Zhao Yang, Zezhong Qian, Xiaofan Li, Weixiang Xu, Gongpeng Zhao, Ruohong Yu, Lingsi Zhu, Longjun Liu
In this work, we present DualDiff, a dual-branch conditional diffusion model designed to enhance driving scene generation across multiple views and video sequences.
no code implementations • 5 Mar 2025 • Qiqi Guo, Zhuowen Zheng, Guanghua Yang, Zhiquan Liu, Xiaofan Li, Jianqing Li, Jinyu Tian, Xueyuan Gong
It enables the model to focus more effectively on hard samples in later training stages, and lead to the extraction of highly discriminative face features.
1 code implementation • 27 Feb 2025 • Xiaofan Li, Xin Tan, Zhuo Chen, Zhizhong Zhang, Ruixin Zhang, rizen guo, Guanna Jiang, Yulong Chen, Yanyun Qu, Lizhuang Ma, Yuan Xie
Finally, considering the risk of ``over-fitting'' to normal images of the diffusion model, we propose an anomaly-masked network to enhance the condition mechanism of the diffusion model.
1 code implementation • 14 Feb 2025 • Sifan Tu, Xin Zhou, Dingkang Liang, Xingyu Jiang, Yumeng Zhang, Xiaofan Li, Xiang Bai
Driving World Model (DWM), which focuses on predicting scene evolution during the driving process, has emerged as a promising paradigm in pursuing autonomous driving.
no code implementations • 30 Jan 2025 • Yansong Qu, Dian Chen, Xinyang Li, Xiaofan Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji
It enables users to conveniently specify the desired editing region and the desired dragging direction through the input of 3D masks and pairs of control points, thereby enabling precise control over the extent of editing.
no code implementations • 21 Dec 2024 • Huan Liu, Lingyu Xiao, JiangJiang Liu, Xiaofan Li, Ze Feng, Sen yang, Jingdong Wang
To understand the factors driving this improvement, we conduct an in-depth analysis of the network architecture, data selection, and training recipe used in public MLLMs.
1 code implementation • 18 Dec 2024 • Yanpeng Sun, Jing Hao, Ke Zhu, Jiang-Jiang Liu, Yuxiang Zhao, Xiaofan Li, Gang Zhang, Zechao Li, Jingdong Wang
We propose to leverage off-the-shelf visual specialists, which were trained from annotated images initially not for image captioning, for enhancing the image caption.
no code implementations • 9 Dec 2024 • Xinyue Pei, Xingwei Wang, Min Huang, Yingyang Chen, Xiaofan Li, Theodoros A. Tsiftsis
To address this challenge, the BS injects artificial noise (AN) to mislead the Eves, and a protected zone is employed to create an Eve-exclusion area around the BS.
1 code implementation • 24 Sep 2024 • Lingyu Xiao, Jiang-Jiang Liu, Sen yang, Xiaofan Li, Xiaoqing Ye, Wankou Yang, Jingdong Wang
In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses.
no code implementations • 22 Apr 2024 • Chi Huang, Xinyang Li, Yansong Qu, Changli Wu, Xiaofan Li, Shengchuan Zhang, Liujuan Cao
Previous works (e. g, NeRF-Det) have demonstrated that implicit representation has the capacity to benefit the visual 3D perception task in indoor scenes with high amount of overlap between input images.
no code implementations • 13 Apr 2024 • Zhuyang Xie, Yan Yang, Jie Wang, Xiaorong Liu, Xiaofan Li
To address the aforementioned problems, we propose a trustworthy multimodal sentiment ordinal network (TMSON) to improve performance in sentiment analysis.
1 code implementation • CVPR 2024 • Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma
The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering.
no code implementations • 8 Dec 2023 • Jiamu Xu, Xiaoxiang Liu, Xinyuan Zhang, Yain-Whar Si, Xiaofan Li, Zheng Shi, Ke Wang, Xueyuan Gong
Learning the discriminative features of different faces is an important task in face recognition.
no code implementations • 11 Oct 2023 • Xiaofan Li, Yifu Zhang, Xiaoqing Ye
To alleviate the problem, we propose a spatial-temporal consistent diffusion framework DrivingDiffusion, to generate realistic multi-view videos controlled by 3D layout.
no code implementations • 23 Sep 2023 • Xiaofan Li, Bo Peng, Jie Hu, Changyou Ma, DaiPeng Yang, Zhuyang Xie
Rather than risk potential pseudo-labeling errors or learning confusion by forcefully classifying these regions, we consider them as uncertainty regions, exempting them from pseudo-labeling and allowing the network to self-learn.
no code implementations • CVPR 2022 • Xia Kong, Zuodong Gao, Xiaofan Li, Ming Hong, Jun Liu, Chengjie Wang, Yuan Xie, Yanyun Qu
Our ICCE promotes intra-class compactness with inter-class separability on both seen and unseen classes in the embedding space and visual feature space.
no code implementations • 27 Oct 2021 • Huayan Guo, Yifan Zhu, Haoyu Ma, Vincent K. N. Lau, Kaibin Huang, Xiaofan Li, Huabin Nong, Mingyu Zhou
In this paper, we develop an orthogonal-frequency-division-multiplexing (OFDM)-based over-the-air (OTA) aggregation solution for wireless federated learning (FL).
no code implementations • 9 Oct 2018 • Xiujun Cheng, Hui Wang, Xiao Wang, Jinqiao Duan, Xiaofan Li
We especially examine those most probable trajectories from low concentration state to high concentration state (i. e., the likely transcription regime) for certain parameters, in order to gain insights into the transcription processes and the tipping time for the transcription likely to occur.