Search Results for author: Yan Xia

Found 83 papers, 37 papers with code

Fast and Accurate Power Load Data Completion via Regularization-optimized Low-Rank Factorization

no code implementations25 May 2025 Yan Xia, Hao Feng, Hongwei Sun, Junjie Wang, Qicong Hu

Low-rank representation learning has emerged as a powerful tool for recovering missing values in power load data due to its ability to exploit the inherent low-dimensional structures of spatiotemporal measurements.

Computational Efficiency Imputation +2

A PID-Controlled Tensor Wheel Decomposition Model for Dynamic Link Prediction

no code implementations20 May 2025 Qu Wang, Yan Xia

Link prediction in dynamic networks remains a fundamental challenge in network science, requiring the inference of potential interactions and their evolving strengths through spatiotemporal pattern analysis.

Dynamic Link Prediction Prediction

OPAL: Visibility-aware LiDAR-to-OpenStreetMap Place Recognition via Adaptive Radial Fusion

no code implementations27 Apr 2025 Shuhao Kang, Martin Y. Liao, Yan Xia, Olaf Wysocki, Boris Jutzi, Daniel Cremers

LiDAR place recognition is a critical capability for autonomous navigation and cross-modal localization in large-scale outdoor environments.

Autonomous Navigation

A Call for New Recipes to Enhance Spatial Reasoning in MLLMs

no code implementations21 Apr 2025 Huanyu Zhang, Chengzu Li, Wenshan Wu, Shaoguang Mao, Yan Xia, Ivan Vulić, Zhang Zhang, Liang Wang, Tieniu Tan, Furu Wei

Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks.

Spatial Reasoning

Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots

no code implementations18 Apr 2025 Zongyuan Chen, Yan Xia, Jiayuan Liu, Jijia Liu, Wenhao Tang, Jiayu Chen, Feng Gao, Longfei Ma, Hongen Liao, Yu Wang, Chao Yu, Boyu Zhang, Fei Xing

In this study, we present a soft robotic system designed for surgical applications and propose a hysteresis-aware whole-body neural network model that accurately captures and predicts the soft robot's whole-body motion, including its hysteretic behavior.

BitNet b1.58 2B4T Technical Report

no code implementations16 Apr 2025 Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, Furu Wei

We introduce BitNet b1. 58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale.

Computational Efficiency Language Modeling +3

RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning

no code implementations16 Apr 2025 Yuan Luo, Rudolf Hoffmann, Yan Xia, Olaf Wysocki, Benedikt Schwab, Thomas H. Kolbe, Daniel Cremers

Moreover, we propose a novel neural network, RADLER, leveraging the effectiveness of contrastive self-supervised learning (SSL) and semantic 3D city models to enhance radar object detection of pedestrians, cyclists, and cars.

Object object-detection +3

Reconstructing Humans with a Biomechanically Accurate Skeleton

no code implementations CVPR 2025 Yan Xia, Xiaowei Zhou, Etienne Vouga, QiXing Huang, Georgios Pavlakos

In this paper, we introduce a method for reconstructing 3D humans from a single image using a biomechanically accurate skeleton model.

Human Mesh Recovery Pose Estimation

Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning

no code implementations26 Mar 2025 Sashuai Zhou, Hai Huang, Yan Xia

Multi-modal models excel in cross-modal tasks but are computationally expensive due to their billions of parameters.

Mixture-of-Experts parameter-efficient fine-tuning

SparseAlign: A Fully Sparse Framework for Cooperative Object Detection

no code implementations CVPR 2025 Yunshuang Yuan, Yan Xia, Daniel Cremers, Monika Sester

In this work, we design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features.

Autonomous Driving object-detection +1

L2RSI: Cross-view LiDAR-based Place Recognition for Large-scale Urban Scenes via Remote Sensing Imagery

no code implementations14 Mar 2025 Ziwei Shi, Xiaoran Zhang, Yan Xia, Yu Zang, Siqi Shen, Cheng Wang

To overcome this, we first construct XA-L&RSI dataset, which encompasses approximately $110, 000$ remote sensing submaps and $13, 000$ LiDAR point cloud submaps captured in urban scenes, and propose a novel method, L2RSI, for cross-view LiDAR place recognition using high-resolution Remote Sensing Imagery.

Cross-modal place recognition Retrieval

CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving

no code implementations9 Mar 2025 Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, Alois Knoll

By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation within 3D space, facilitating a more precise representation of dynamic scenes.

2D Semantic Segmentation 4D reconstruction +3

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

no code implementations7 Mar 2025 Guanghao Zhang, Tao Zhong, Yan Xia, Zhelun Yu, Haoyuan Li, Wanggui He, Fangxun Shu, Mushui Liu, Dong She, Yi Wang, Hao Jiang

The construction of interleaved multimodal multi-step reasoning chains, which utilize critical visual region tokens, extracted from intermediate reasoning steps, as supervisory signals.

Image Comprehension Memorization

EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration

1 code implementation20 Feb 2025 Minjie Hong, Yan Xia, Zehan Wang, Jieming Zhu, Ye Wang, Sihang Cai, Xiaoda Yang, Quanyu Dai, Zhenhua Dong, Zhimeng Zhang, Zhou Zhao

Large language models (LLMs) are increasingly leveraged as foundational backbones in the development of advanced recommender systems, offering enhanced capabilities through their extensive knowledge and reasoning.

Decoder Recommendation Systems

FacaDiffy: Inpainting Unseen Facade Parts Using Diffusion Models

1 code implementation20 Feb 2025 Thomas Froech, Olaf Wysocki, Yan Xia, Junyu Xie, Benedikt Schwab, Daniel Cremers, Thomas H. Kolbe

To address this challenge, we introduce FacaDiffy, a novel method for inpainting unseen facade parts by completing conflict maps with a personalized Stable Diffusion model.

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

1 code implementation17 Feb 2025 Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei

The advent of 1-bit large language models (LLMs), led by BitNet b1. 58, has spurred interest in ternary LLMs.

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

1 code implementation13 Jan 2025 Chengzu Li, Wenshan Wu, Huanyu Zhang, Yan Xia, Shaoguang Mao, Li Dong, Ivan Vulić, Furu Wei

Ultimately, MVoT establishes new possibilities for complex reasoning tasks where visual thinking can effectively complement verbal reasoning.

Spatial Reasoning

Semantic Residual for Multimodal Unified Discrete Representation

no code implementations26 Dec 2024 Hai Huang, Shulei Wang, Yan Xia

Recent research in the domain of multimodal unified representations predominantly employs codebook as representation forms, utilizing Vector Quantization(VQ) for quantization, yet there has been insufficient exploration of other quantization representation forms.

Disentanglement Quantization +1

TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

no code implementations13 Dec 2024 Yan Xia, Yunxiang Lu, Rui Song, Oussema Dhaouadi, João F. Henriques, Daniel Cremers

To overcome the lack of large-scale real-world intersection datasets, we introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla.

Contrastive Learning Image to Point Cloud Registration

SADG: Segment Any Dynamic Gaussian Without Object Trackers

1 code implementation28 Nov 2024 Yun-Jin Li, Mariia Gladkova, Yan Xia, Daniel Cremers

We introduce SADG, Segment Any Dynamic Gaussian Without Object Trackers, a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs.

3D Reconstruction Autonomous Driving +3

1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

1 code implementation21 Oct 2024 Jinheng Wang, Hansong Zhou, Ting Song, Shaoguang Mao, Shuming Ma, Hongyu Wang, Yan Xia, Furu Wei

Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1. 58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption.

CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays

1 code implementation29 Sep 2024 Nuowei Liu, Xinhao Chen, Hongyi Wu, Changzhi Sun, Man Lan, Yuanbin Wu, Xiaopeng Bai, Shaoguang Mao, Yan Xia

Existing rhetorical understanding and generation datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories, neglecting the common interrelations between different rhetorical devices by treating them as independent sub-tasks.

World-Grounded Human Motion Recovery via Gravity-View Coordinates

no code implementations10 Sep 2024 Zehong Shen, Huaijin Pi, Yan Xia, Zhi Cen, Sida Peng, Zechen Hu, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

Instead, we propose estimating human poses in a novel Gravity-View (GV) coordinate system, which is defined by the world gravity and the camera view direction.

PSLF: A PID Controller-incorporated Second-order Latent Factor Analysis Model for Recommender System

no code implementations31 Aug 2024 Jialiang Wang, Yan Xia, Ye Yuan

A second-order-based latent factor (SLF) analysis model demonstrates superior performance in graph representation learning, particularly for high-dimensional and incomplete (HDI) interaction data, by incorporating the curvature information of the loss landscape.

Graph Representation Learning Recommendation Systems

L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

1 code implementation7 Aug 2024 Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, Cheng Wang

However, the fusion of LiDAR and 4D radar is challenging because they differ significantly in terms of data quality and the degree of degradation in adverse weather.

Autonomous Navigation Denoising +4

Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection

1 code implementation17 Jul 2024 Hu Cao, Zehua Zhang, Yan Xia, Xinyi Li, Jiahao Xia, Guang Chen, Alois Knoll

The core concept is the design of the coarse-to-fine fusion module, denoted as the cross-modality adaptive feature refinement (CAFR) module.

object-detection Object Detection

Gap Completion in Point Cloud Scene occluded by Vehicles using SGC-Net

no code implementations11 Jul 2024 Yu Feng, Yiming Xu, Yan Xia, Claus Brenner, Monika Sester

In this study, we present a novel approach that leverages deep neural networks to learn a model capable of filling gaps in urban scenes that are obscured by vehicle occlusion.

Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning

no code implementations8 Jul 2024 Yadong Zhang, Shaoguang Mao, Wenshan Wu, Yan Xia, Tao Ge, Man Lan, Furu Wei

This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models.

Decision Making Language Modeling +1

TARGO: Benchmarking Target-driven Object Grasping under Occlusions

no code implementations8 Jul 2024 Yan Xia, Ran Ding, Ziyuan Qin, Guanqi Zhan, Kaichen Zhou, Long Yang, Hao Dong, Daniel Cremers

3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world.

Benchmarking Object +1

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

no code implementations25 Jun 2024 Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries.

Cross-Modal Retrieval Natural Language Queries +2

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

1 code implementation20 Jun 2024 Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, Zhenhua Dong

Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem.

Retrieval Sequential Recommendation

Meta Reasoning for Large Language Models

no code implementations17 Jun 2024 Peizhong Gao, Ao Xie, Shaoguang Mao, Wenshan Wu, Yan Xia, Haipeng Mi, Furu Wei

MRP represents a significant advancement in enabling LLMs to identify cognitive challenges across problems and leverage benefits across different reasoning approaches, enhancing their ability to handle diverse and complex problem domains efficiently.

Computational Efficiency In-Context Learning

Localizing Events in Videos with Multimodal Queries

no code implementations CVPR 2025 Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma, Yan Xia, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu

Localizing events in videos based on semantic queries is a pivotal task in video understanding, with the growing significance of user-oriented applications like video search.

Natural Language Queries Video Understanding

Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

1 code implementation4 Apr 2024 Wenshan Wu, Shaoguang Mao, Yadong Zhang, Yan Xia, Li Dong, Lei Cui, Furu Wei

Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks.

Spatial Reasoning Visual Navigation

LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models

no code implementations1 Apr 2024 Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei

This paper presents a comprehensive survey of the current status and opportunities for Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning that necessitates understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.

Decision Making

VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition

1 code implementation21 Mar 2024 Yun-Jin Li, Mariia Gladkova, Yan Xia, Rui Wang, Daniel Cremers

To tackle this issue, we propose Voxel-Cross-Pixel (VXP), a novel camera-to-LiDAR place recognition framework that enforces local similarities in a self-supervised manner and effectively brings global context from images and LiDAR scans into a shared feature space.

Cross-modal place recognition Cross-Modal Retrieval +1

Unsupervised Domain Adaptation for Brain Vessel Segmentation through Transwarp Contrastive Learning

1 code implementation23 Feb 2024 Fengming Lin, Yan Xia, Michael MacRaild, Yash Deo, Haoran Dou, Qiongyao Liu, Kun Wu, Nishant Ravikumar, Alejandro F. Frangi

Unsupervised domain adaptation (UDA) aims to align the labelled source distribution with the unlabelled target distribution to obtain domain-invariant predictive models.

Contrastive Learning Medical Image Analysis +1

K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning

no code implementations2 Feb 2024 Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei

LLMs and LLM agents often struggle with strategic reasoning due to the absence of a reasoning framework that enables them to dynamically infer others' perspectives and adapt to changing environments.

Decision Making Language Modelling +1

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding

no code implementations21 Dec 2023 Haifeng Huang, Yang Zhao, Zehan Wang, Yan Xia, Zhou Zhao

Thus, to address this issue and enhance model performance on new scenes, we explore the TVG task in an unsupervised domain adaptation (UDA) setting across scenes for the first time, where the video-query pairs in the source scene (domain) are labeled with temporal boundaries, while those in the target scene are not.

Unsupervised Domain Adaptation Video Grounding

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

1 code implementation17 Dec 2023 Yu Zhang, Rongjie Huang, RuiQi Li, Jinzheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase.

Quantization Singing Voice Synthesis +1

Text2Loc: 3D Point Cloud Localization from Natural Language

1 code implementation CVPR 2024 Yan Xia, Letian Shi, Zifeng Ding, João F. Henriques, Daniel Cremers

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text.

Contrastive Learning Visual Place Recognition

Multi-view Hybrid Graph Convolutional Network for Volume-to-mesh Reconstruction in Cardiovascular MRI

1 code implementation22 Nov 2023 Nicolás Gaggion, Benjamin A. Matheson, Yan Xia, Rodrigo Bonazzola, Nishant Ravikumar, Zeike A. Taylor, Diego H. Milone, Alejandro F. Frangi, Enzo Ferrante

In response, we introduce HybridVNet, a novel architecture for direct image-to-mesh extraction seamlessly integrating standard convolutional neural networks with graph convolutions, which we prove can efficiently handle surface and volumetric meshes by encoding them as graph structures.

Anatomy

Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks

1 code implementation NeurIPS 2023 Haoyi Duan, Yan Xia, Mingze Zhou, Li Tang, Jieming Zhu, Zhou Zhao

This mechanism leverages audio and visual modalities as soft prompts to dynamically adjust the parameters of pre-trained models based on the current multi-modal input features.

ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI Agents

1 code implementation6 Nov 2023 Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, Furu Wei

This paper introduces Alympics (Olympics for Agents), a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research.

Decision Making Language Modeling +2

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

no code implementations12 Oct 2023 Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan

In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner.

Form In-Context Learning +1

SCP: Scene Completion Pre-training for 3D Object Detection

no code implementations12 Sep 2023 Yiming Shan, Yan Xia, Yuhong Chen, Daniel Cremers

In this paper, we propose a Scene Completion Pre-training (SCP) method to enhance the performance of 3D object detectors with less labeled data.

3D Object Detection Autonomous Driving +2

Learned Local Attention Maps for Synthesising Vessel Segmentations

no code implementations24 Aug 2023 Yash Deo, Rodrigo Bonazzola, Haoran Dou, Yan Xia, Tianyou Wei, Nishant Ravikumar, Alejandro F. Frangi, Toni Lassila

We present an encoder-decoder model for synthesising segmentations of the main cerebral arteries in the circle of Willis (CoW) from only T2 MRI.

Decoder Diagnostic

Temporal Fact Reasoning over Hyper-Relational Knowledge Graphs

1 code implementation14 Jul 2023 Zifeng Ding, Jingcheng Wu, Jingpei Wu, Yan Xia, Volker Tresp

We develop two new benchmark HTKG datasets, i. e., Wiki-hy and YAGO-hy, and propose an HTKG reasoning model that efficiently models hyper-relational temporal facts.

Knowledge Graphs Link Prediction +1

Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models

no code implementations8 Jun 2023 Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia, Yan Deng, Jonathan Tien

To leverage NLP models, speech input is first force-aligned with texts, and then pre-processed into a token sequence, including words and phrase break information.

text-classification Text Classification

Smart Word Suggestions for Writing Assistance

1 code implementation17 May 2023 Chenshuo Wang, Shaoguang Mao, Tao Ge, Wenshan Wu, Xun Wang, Yan Xia, Jonathan Tien, Dongyan Zhao

The training dataset comprises over 3. 7 million sentences and 12. 7 million suggestions generated through rules.

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

1 code implementation11 May 2023 Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, Furu Wei

Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages.

All Arithmetic Reasoning +2

Low-code LLM: Graphical User Interface over Large Language Models

2 code implementations17 Apr 2023 Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei

By introducing this framework, we aim to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks.

Prompt Engineering

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

no code implementations29 Mar 2023 Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well.

Code Generation Common Sense Reasoning +1

High-throughput 3DRA segmentation of brain vasculature and aneurysms using deep learning

1 code implementation Computer Methods and Programs in Biomedicine 2023 Fengming Lin, Yan Xia, Shuang Song, Nishant Ravikumar, Alejandro F Frangi

Results:On the internal clinical dataset, our method consistently outperformed several state-of-the-art approaches for vessel and aneurysm segmentation, achieving an average Dice score of 0. 81 (0. 15 higher than nnUNet) and an average surface-to-surface error of 0. 20 mm (less than the in-plane resolution (0. 35 mm/pixel)) for aneurysm segmentation; and an average Dice score of 0. 91 and average surface-to-surface error of 0. 25 mm for vessel segmentation.

Segmentation

CASSPR: Cross Attention Single Scan Place Recognition

1 code implementation ICCV 2023 Yan Xia, Mariia Gladkova, Rui Wang, Qianyun Li, Uwe Stilla, João F. Henriques, Daniel Cremers

CASSPR uses queries from one branch to try to match structures in the other branch, ensuring that both extract self-contained descriptors of the point cloud (rather than one branch dominating), but using both to inform the output global descriptor of the point cloud.

Video-Guided Curriculum Learning for Spoken Video Grounding

1 code implementation1 Sep 2022 Yan Xia, Zhou Zhao, Shangwei Ye, Yang Zhao, Haoyuan Li, Yi Ren

To rectify the discriminative phonemes and extract video-related information from noisy audio, we develop a novel video-guided curriculum learning (VGCL) during the audio pre-training process, which can make use of the vital visual perceptions to help understand the spoken language and suppress the external noise.

Video Grounding

A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds

1 code implementation8 Mar 2022 Yan Xia, Qiangqiang Wu, Wei Li, Antoni B. Chan, Uwe Stilla

Recent works on 3D single object tracking treat the task as a target-specific 3D detection task, where an off-the-shelf 3D detector is commonly employed for the tracking.

3D Single Object Tracking motion prediction +1

Cross-Modal Background Suppression for Audio-Visual Event Localization

1 code implementation CVPR 2022 Yan Xia, Zhou Zhao

Audiovisual Event (AVE) localization requires the model to jointly localize an event by observing audio and visual information.

audio-visual event localization

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

no code implementations14 Oct 2021 Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu

These embeddings, when used as implicit phonetic supplementary information, can alleviate the data shortage of explicit phoneme annotations.

Rapid Assessments of Light-Duty Gasoline Vehicle Emissions Using On-Road Remote Sensing and Machine Learning

no code implementations1 Oct 2021 Yan Xia, Linhui Jiang, Lu Wang, Xue Chen, Jianjie Ye, Tangyan Hou, Liqiang Wang, Yibo Zhang, Mengying Li, Zhen Li, Zhe Song, Yaping Jiang, Weiping Liu, Pengfei Li, Daniel Rosenfeld, John H. Seinfeld, Shaocai Yu

Our results show that the ORRS measurements, assisted by the machine-learning-based ensemble model developed here, can realize day-to-day supervision of on-road vehicle-specific emissions.

A Deep Discontinuity-Preserving Image Registration Network

1 code implementation9 Jul 2021 Xiang Chen, Nishant Ravikumar, Yan Xia, Alejandro F Frangi

Image registration aims to establish spatial correspondence across pairs, or groups of images, and is a cornerstone of medical image computing and computer-assisted-interventions.

Image Registration Medical Image Registration +1

ASFM-Net: Asymmetrical Siamese Feature Matching Network for Point Completion

1 code implementation19 Apr 2021 Yaqi Xia, Yan Xia, Wei Li, Rui Song, Kailang Cao, Uwe Stilla

We tackle the problem of object completion from point clouds and propose a novel point cloud completion network employing an Asymmetrical Siamese Feature Matching strategy, termed as ASFM-Net.

Point Cloud Completion

SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud based Place Recognition

1 code implementation CVPR 2021 Yan Xia, Yusheng Xu, Shuang Li, Rui Wang, Juan Du, Daniel Cremers, Uwe Stilla

We tackle the problem of place recognition from point cloud data and introduce a self-attention and orientation encoding network (SOE-Net) that fully explores the relationship between points and incorporates long-range context into point-wise local descriptors.

3D Place Recognition Metric Learning +1

Improving pronunciation assessment via ordinal regression with anchored reference samples

no code implementations26 Oct 2020 Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu

Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations.

regression Sentence

VPC-Net: Completion of 3D Vehicles from MLS Point Clouds

1 code implementation8 Aug 2020 Yan Xia, Yusheng Xu, Cheng Wang, Uwe Stilla

Moreover, a new refiner module is also presented to preserve the vehicle details from inputs and refine the complete outputs with fine-grained information.

Autonomous Driving

RealPoint3D: Point Cloud Generation from a Single Image with Complex Background

1 code implementation8 Sep 2018 Yan Xia, Yang Zhang, Dingfu Zhou, Xinyu Huang, Cheng Wang, Ruigang Yang

Then, the image together with the retrieved shape model is fed into the proposed network to generate the fine-grained 3D point cloud.

3D Generation Point Cloud Generation

Mixed one-bit compressive sensing with applications to overexposure correction for CT reconstruction

no code implementations3 Jan 2017 Xiaolin Huang, Yan Xia, Lei Shi, Yixing Huang, Ming Yan, Joachim Hornegger, Andreas Maier

Aiming at overexposure correction for computed tomography (CT) reconstruction, we in this paper propose a mixed one-bit compressive sensing (M1bit-CS) to acquire information from both regular and saturated measurements.

Compressive Sensing Computed Tomography (CT) +2

Learning Discriminative Reconstructions for Unsupervised Outlier Removal

no code implementations ICCV 2015 Yan Xia, Xudong Cao, Fang Wen, Gang Hua, Jian Sun

We study the problem of automatically removing outliers from noisy data, with application for removing outlier images from an image collection.

Cannot find the paper you are looking for? You can Submit a new open access paper.