Search Results for author: Xiaofan Li

Found 22 papers, 7 papers with code

Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception

1 code implementation17 Mar 2025 Dingkang Liang, Dingyuan Zhang, Xin Zhou, Sifan Tu, Tianrui Feng, Xiaofan Li, Yumeng Zhang, Mingyang Du, Xiao Tan, Xiang Bai

Extensive experiments on the nuScenes dataset demonstrate that UniFuture outperforms specialized models on future generation and perception tasks, highlighting the advantages of a unified, structurally-aware world model.

Future prediction Scene Generation

AttFC: Attention Fully-Connected Layer for Large-Scale Face Recognition with One GPU

no code implementations10 Mar 2025 Zhuowen Zheng, Yain-Whar Si, Xiaochen Yuan, Junwei Duan, Ke Wang, Xiaofan Li, Xinyuan Zhang, Xueyuan Gong

Nowadays, with the advancement of deep neural networks (DNNs) and the availability of large-scale datasets, the face recognition (FR) model has achieved exceptional performance.

Face Recognition

RVAFM: Re-parameterizing Vertical Attention Fusion Module for Handwritten Paragraph Text Recognition

no code implementations5 Mar 2025 Jinhui Zheng, Zhiquan Liu, Yain-Whar Si, Jianqing Li, Xinyuan Zhang, Xiaofan Li, HaoZhi Huang, Xueyuan Gong

One of the most advanced models for this task is Vertical Attention Network (VAN), which utilizes a Vertical Attention Module (VAM) to implicitly segment paragraph text images into text lines, thereby reducing the difficulty of the recognition task.

DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance

1 code implementation5 Mar 2025 Zhao Yang, Zezhong Qian, Xiaofan Li, Weixiang Xu, Gongpeng Zhao, Ruohong Yu, Lingsi Zhu, Longjun Liu

In this work, we present DualDiff, a dual-branch conditional diffusion model designed to enhance driving scene generation across multiple views and video sequences.

3D Object Detection BEV Segmentation +3

AdaSin: Enhancing Hard Sample Metrics with Dual Adaptive Penalty for Face Recognition

no code implementations5 Mar 2025 Qiqi Guo, Zhuowen Zheng, Guanghua Yang, Zhiquan Liu, Xiaofan Li, Jianqing Li, Jinyu Tian, Xueyuan Gong

It enables the model to focus more effectively on hard samples in later training stages, and lead to the extraction of highly discriminative face features.

Face Recognition

One-for-More: Continual Diffusion Model for Anomaly Detection

1 code implementation27 Feb 2025 Xiaofan Li, Xin Tan, Zhuo Chen, Zhizhong Zhang, Ruixin Zhang, rizen guo, Guanna Jiang, Yulong Chen, Yanyun Qu, Lizhuang Ma, Yuan Xie

Finally, considering the risk of ``over-fitting'' to normal images of the diffusion model, we propose an anomaly-masked network to enhance the condition mechanism of the diffusion model.

Anomaly Detection continual anomaly detection +2

The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey

1 code implementation14 Feb 2025 Sifan Tu, Xin Zhou, Dingkang Liang, Xingyu Jiang, Yumeng Zhang, Xiaofan Li, Xiang Bai

Driving World Model (DWM), which focuses on predicting scene evolution during the driving process, has emerged as a promising paradigm in pursuing autonomous driving.

Autonomous Driving Survey

Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting

no code implementations30 Jan 2025 Yansong Qu, Dian Chen, Xinyang Li, Xiaofan Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

It enables users to conveniently specify the desired editing region and the desired dragging direction through the input of 3D masks and pairs of control points, thereby enabling precise control over the extent of editing.

3DGS 3D scene Editing

Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities

no code implementations21 Dec 2024 Huan Liu, Lingyu Xiao, JiangJiang Liu, Xiaofan Li, Ze Feng, Sen yang, Jingdong Wang

To understand the factors driving this improvement, we conduct an in-depth analysis of the network architecture, data selection, and training recipe used in public MLLMs.

Attribute Classification +4

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

1 code implementation18 Dec 2024 Yanpeng Sun, Jing Hao, Ke Zhu, Jiang-Jiang Liu, Yuxiang Zhao, Xiaofan Li, Gang Zhang, Zechao Li, Jingdong Wang

We propose to leverage off-the-shelf visual specialists, which were trained from annotated images initially not for image captioning, for enhancing the image caption.

Descriptive Human-Object Interaction Detection +2

Physical Layer Security in AmBC-NOMA Networks with Random Eavesdroppers

no code implementations9 Dec 2024 Xinyue Pei, Xingwei Wang, Min Huang, Yingyang Chen, Xiaofan Li, Theodoros A. Tsiftsis

To address this challenge, the BS injects artificial noise (AN) to mislead the Eves, and a protected zone is employed to create an Eve-exclusion area around the BS.

Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving

1 code implementation24 Sep 2024 Lingyu Xiao, Jiang-Jiang Liu, Sen yang, Xiaofan Li, Xiaoqing Ye, Wankou Yang, Jingdong Wang

In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses.

Autonomous Driving Imitation Learning +1

NeRF-DetS: Enhanced Adaptive Spatial-wise Sampling and View-wise Fusion Strategies for NeRF-based Indoor Multi-view 3D Object Detection

no code implementations22 Apr 2024 Chi Huang, Xinyang Li, Yansong Qu, Changli Wu, Xiaofan Li, Shengchuan Zhang, Liujuan Cao

Previous works (e. g, NeRF-Det) have demonstrated that implicit representation has the capacity to benefit the visual 3D perception task in indoor scenes with high amount of overlap between input images.

3D Object Detection NeRF +3

Trustworthy Multimodal Fusion for Sentiment Analysis in Ordinal Sentiment Space

no code implementations13 Apr 2024 Zhuyang Xie, Yan Yang, Jie Wang, Xiaorong Liu, Xiaofan Li

To address the aforementioned problems, we propose a trustworthy multimodal sentiment ordinal network (TMSON) to improve performance in sentiment analysis.

Multimodal Sentiment Analysis

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

1 code implementation CVPR 2024 Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma

The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering.

Anomaly Detection Language Modeling +2

DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model

no code implementations11 Oct 2023 Xiaofan Li, Yifu Zhang, Xiaoqing Ye

To alleviate the problem, we propose a spatial-temporal consistent diffusion framework DrivingDiffusion, to generate realistic multi-view videos controlled by 3D layout.

Autonomous Driving Image Generation +1

USL-Net: Uncertainty Self-Learning Network for Unsupervised Skin Lesion Segmentation

no code implementations23 Sep 2023 Xiaofan Li, Bo Peng, Jie Hu, Changyou Ma, DaiPeng Yang, Zhuyang Xie

Rather than risk potential pseudo-labeling errors or learning confusion by forcefully classifying these regions, we consider them as uncertainty regions, exempting them from pseudo-labeling and allowing the network to self-learn.

Contrastive Learning Lesion Segmentation +2

En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning

no code implementations CVPR 2022 Xia Kong, Zuodong Gao, Xiaofan Li, Ming Hong, Jun Liu, Chengjie Wang, Yuan Xie, Yanyun Qu

Our ICCE promotes intra-class compactness with inter-class separability on both seen and unseen classes in the embedding space and visual feature space.

Generalized Zero-Shot Learning

Over-the-Air Aggregation for Federated Learning: Waveform Superposition and Prototype Validation

no code implementations27 Oct 2021 Huayan Guo, Yifan Zhu, Haoyu Ma, Vincent K. N. Lau, Kaibin Huang, Xiaofan Li, Huabin Nong, Mingyu Zhou

In this paper, we develop an orthogonal-frequency-division-multiplexing (OFDM)-based over-the-air (OTA) aggregation solution for wireless federated learning (FL).

Federated Learning

Most Probable Evolution Trajectories in a Genetic Regulatory System Excited by Stable Lévy Noise

no code implementations9 Oct 2018 Xiujun Cheng, Hui Wang, Xiao Wang, Jinqiao Duan, Xiaofan Li

We especially examine those most probable trajectories from low concentration state to high concentration state (i. e., the likely transcription regime) for certain parameters, in order to gain insights into the transcription processes and the tipping time for the transcription likely to occur.

Cannot find the paper you are looking for? You can Submit a new open access paper.