Search Results for author: Xinyang Li

Found 30 papers, 11 papers with code

AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis

no code implementations10 Mar 2025 Zhangyu Lai, Yilin Lu, Xinyang Li, Jianghang Lin, Yansong Qu, Liujuan Cao, Ming Li, Rongrong Ji

While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle.

Diversity General Knowledge

Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting

no code implementations30 Jan 2025 Yansong Qu, Dian Chen, Xinyang Li, Xiaofan Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

It enables users to conveniently specify the desired editing region and the desired dragging direction through the input of 3D masks and pairs of control points, thereby enabling precise control over the extent of editing.

3DGS 3D scene Editing

SE(3)-Based Trajectory Optimization and Target Tracking in UAV-Enabled ISAC Systems

no code implementations20 Jan 2025 Dongxiao Xu, Xinyang Li, Vlad C. Andrei, Moritz Wiese, Ullrich J. Moenich, Holger Boche

This paper presents a novel approach to enhance sensing capabilities in UAV-enabled MIMO-OFDM ISAC systems by leveraging UAV mobility as a mono-static radar.

ISAC parameter estimation

Generative AI for Cel-Animation: A Survey

1 code implementation8 Jan 2025 Yunlong Tang, Junjia Guo, Pinxin Liu, Zhiyuan Wang, Hang Hua, Jia-Xing Zhong, Yunzhong Xiao, Chao Huang, Luchuan Song, Susan Liang, Yizhi Song, Liu He, Jing Bi, Mingqian Feng, Xinyang Li, Zeliang Zhang, Chenliang Xu

Traditional Celluloid (Cel) Animation production pipeline encompasses multiple essential steps, including storyboarding, layout design, keyframe animation, inbetweening, and colorization, which demand substantial manual effort, technical expertise, and significant time investment.

Colorization Layout Design +1

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation

no code implementations30 Dec 2024 Yuanbo Yang, Jiahao Shao, Xinyang Li, Yujun Shen, Andreas Geiger, Yiyi Liao

In this work, we introduce Prometheus, a 3D-aware latent diffusion model for text-to-3D generation at both object and scene levels in seconds.

3D Generation Scene Generation +2

Look a Group at Once: Multi-Slide Modeling for Survival Prediction

no code implementations18 Nov 2024 Xinyang Li, Yi Zhang, Yi Xie, Jianfei Yang, Xi Wang, Hao Chen, Haixian Zhang

In this paper, we introduce GroupMIL, a novel framework inspired by the clinical practice of collective analysis, which models multiple slides as a single sample and organizes groups of patches and slides sequentially to capture cross-slide prognostic features.

Survival Prediction

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

no code implementations17 Nov 2024 Yunlong Tang, Junjia Guo, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, Pooyan Fazli, Chenliang Xu

The advancement of Multimodal Large Language Models (MLLMs) has enabled significant progress in multimodal understanding, expanding their capacity to analyze video content.

Multiple-choice

Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching

no code implementations8 Oct 2024 Gongxin Yao, Xinyang Li, Luowei Fu, Yu Pan

To this end, one of the key challenges is cross-modal place recognition, which involves retrieving 3D scenes (point clouds) from a LiDAR map according to online RGB images.

Camera Localization Contrastive Learning +2

CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration

no code implementations5 Aug 2024 Gongxin Yao, Yixin Xuan, Xinyang Li, Yu Pan

Image-to-point cloud registration aims to determine the relative camera pose of an RGB image with respect to a point cloud.

Camera Localization Image to Point Cloud Registration +1

MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval

no code implementations5 Aug 2024 Gongxin Yao, Xinyang Li, Yixin Xuan, Yu Pan

Image-to-point cloud registration seeks to estimate their relative camera pose, which remains an open question due to the data modality gaps.

Image to Point Cloud Registration Pose Retrieval +1

R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models

no code implementations16 Jul 2024 Aladin Djuhera, Vlad C. Andrei, Xinyang Li, Ullrich J. Mönich, Holger Boche, Walid Saad

In this paper, rigorous insights are provided into the influence of jamming LLM word embeddings in SFL by deriving an expression for the ML training loss divergence and showing that it is upper-bounded by the mean squared error (MSE).

Federated Learning Scheduling +1

FAGhead: Fully Animate Gaussian Head from Monocular Videos

no code implementations27 Jun 2024 Yixin Xuan, Xinyang Li, Gongxin Yao, Shiwei Zhou, Donghui Sun, Xiaoxin Chen, Yu Pan

High-fidelity reconstruction of 3D human avatars has a wild application in visual reality.

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

1 code implementation25 Jun 2024 Xinyang Li, Zhangyu Lai, Linning Xu, Yansong Qu, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji

To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the Cinematographer, to model the distribution of camera trajectories based on textual descriptions.

3D Generation Denoising +2

GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

no code implementations27 May 2024 Yansong Qu, Shaohui Dai, Xinyang Li, Jianghang Lin, Liujuan Cao, Shengchuan Zhang, Rongrong Ji

To this end, we introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) and identifies 3D Gaussians of Interest using an Optimizable Semantic-space Hyperplane.

3DGS feature selection +3

GGAvatar: Geometric Adjustment of Gaussian Head Avatar

no code implementations20 May 2024 Xinyang Li, Jiaxin Wang, Yixin Xuan, Gongxin Yao, Yu Pan

We propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations.

MORPH

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

no code implementations16 May 2024 Xinyang Li, Zhangyu Lai, Linning Xu, Jianfei Guo, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji

We present Dual3D, a novel text-to-3D generation framework that generates high-quality 3D assets from texts in only $1$ minute. The key component is a dual-mode multi-view latent diffusion model.

3D Generation Denoising +1

NeRF-DetS: Enhanced Adaptive Spatial-wise Sampling and View-wise Fusion Strategies for NeRF-based Indoor Multi-view 3D Object Detection

no code implementations22 Apr 2024 Chi Huang, Xinyang Li, Yansong Qu, Changli Wu, Xiaofan Li, Shengchuan Zhang, Liujuan Cao

Previous works (e. g, NeRF-Det) have demonstrated that implicit representation has the capacity to benefit the visual 3D perception task in indoor scenes with high amount of overlap between input images.

3D Object Detection NeRF +3

StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views

1 code implementation8 Jun 2023 Jianfei Guo, Nianchen Deng, Xinyang Li, Yeqi Bai, Botian Shi, Chiyu Wang, Chenjing Ding, Dongliang Wang, Yikang Li

We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf, that is readily applicable to street view images in widely-used autonomous driving datasets, such as Waymo-perception sequences, without necessarily requiring LiDAR data.

Autonomous Driving Neural Rendering +2

Monitoring the evolution of dimensional accuracy and product properties in property-controlled forming processes

no code implementations31 May 2023 Sophie Charlotte Stebner, Juri Martschin, Bahman Arian, Stefan Dietrich, Martin Feistle, Sebastian Hütter, Rémi Lafarge, Robert Laue, Xinyang Li, Christopher Schulte, Daniel Spies, Ferdinand Thein, Frank Wendler, Malte Wrobel, Julian Rozo Vasquez, Michael Dölz, Sebastian Münstermann

However, a closed-loop control that can adjust and manipulate the process actuators according to the required product properties of the component will lead to a considerable increase in efficiency of the processes regarding resources and will decrease postproduction of the component.

MGR: Multi-generator Based Rationalization

1 code implementation8 May 2023 Wei Liu, Haozhao Wang, Jun Wang, Ruixuan Li, Xinyang Li, Yuankai Zhang, Yang Qiu

Rationalization is to employ a generator and a predictor to construct a self-explaining NLP model in which the generator selects a subset of human-intelligible pieces of the input text to the following predictor.

Sensing-Assisted Receivers for Resilient-By-Design 6G MU-MIMO Uplink

no code implementations14 Feb 2023 Vlad C. Andrei, Xinyang Li, Ullrich J. Mönich, Holger Boche

We address the resilience of future 6G MIMO communications by considering an uplink scenario where multiple legitimate transmitters try to communicate with a base station in the presence of an adversarial jammer.

Image-to-image Translation via Hierarchical Style Disentanglement

1 code implementation CVPR 2021 Xinyang Li, Shengchuan Zhang, Jie Hu, Liujuan Cao, Xiaopeng Hong, Xudong Mao, Feiyue Huang, Yongjian Wu, Rongrong Ji

Recently, image-to-image translation has made significant progress in achieving both multi-label (\ie, translation conditioned on different labels) and multi-style (\ie, generation with diverse styles) tasks.

Disentanglement Multimodal Unsupervised Image-To-Image Translation +1

Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning

1 code implementation29 Apr 2019 Xinyang Li, Jie Hu, Shengchuan Zhang, Xiaopeng Hong, Qixiang Ye, Chenglin Wu, Rongrong Ji

Especially, AGUIT benefits from two-fold: (1) It adopts a novel semi-supervised learning process by translating attributes of labeled data to unlabeled data, and then reconstructing the unlabeled data by a cycle consistency operation.

Attribute Disentanglement +2

Cannot find the paper you are looking for? You can Submit a new open access paper.