no code implementations • 25 May 2025 • Yan Xia, Hao Feng, Hongwei Sun, Junjie Wang, Qicong Hu
Low-rank representation learning has emerged as a powerful tool for recovering missing values in power load data due to its ability to exploit the inherent low-dimensional structures of spatiotemporal measurements.
no code implementations • 20 May 2025 • Qu Wang, Yan Xia
Link prediction in dynamic networks remains a fundamental challenge in network science, requiring the inference of potential interactions and their evolving strengths through spatiotemporal pattern analysis.
no code implementations • 27 Apr 2025 • Shuhao Kang, Martin Y. Liao, Yan Xia, Olaf Wysocki, Boris Jutzi, Daniel Cremers
LiDAR place recognition is a critical capability for autonomous navigation and cross-modal localization in large-scale outdoor environments.
no code implementations • 21 Apr 2025 • Huanyu Zhang, Chengzu Li, Wenshan Wu, Shaoguang Mao, Yan Xia, Ivan Vulić, Zhang Zhang, Liang Wang, Tieniu Tan, Furu Wei
Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks.
no code implementations • 18 Apr 2025 • Zongyuan Chen, Yan Xia, Jiayuan Liu, Jijia Liu, Wenhao Tang, Jiayu Chen, Feng Gao, Longfei Ma, Hongen Liao, Yu Wang, Chao Yu, Boyu Zhang, Fei Xing
In this study, we present a soft robotic system designed for surgical applications and propose a hysteresis-aware whole-body neural network model that accurately captures and predicts the soft robot's whole-body motion, including its hysteretic behavior.
no code implementations • 16 Apr 2025 • Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, Furu Wei
We introduce BitNet b1. 58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale.
no code implementations • 16 Apr 2025 • Yuan Luo, Rudolf Hoffmann, Yan Xia, Olaf Wysocki, Benedikt Schwab, Thomas H. Kolbe, Daniel Cremers
Moreover, we propose a novel neural network, RADLER, leveraging the effectiveness of contrastive self-supervised learning (SSL) and semantic 3D city models to enhance radar object detection of pedestrians, cyclists, and cars.
no code implementations • CVPR 2025 • Yan Xia, Xiaowei Zhou, Etienne Vouga, QiXing Huang, Georgios Pavlakos
In this paper, we introduce a method for reconstructing 3D humans from a single image using a biomechanically accurate skeleton model.
no code implementations • 26 Mar 2025 • Sashuai Zhou, Hai Huang, Yan Xia
Multi-modal models excel in cross-modal tasks but are computationally expensive due to their billions of parameters.
no code implementations • CVPR 2025 • Yunshuang Yuan, Yan Xia, Daniel Cremers, Monika Sester
In this work, we design a fully sparse framework, SparseAlign, with three key features: an enhanced sparse 3D backbone, a query-based temporal context learning module, and a robust detection head specially tailored for sparse features.
no code implementations • 17 Mar 2025 • Johannes Meier, Louis Inchingolo, Oussema Dhaouadi, Yan Xia, Jacques Kaiser, Daniel Cremers
We tackle the problem of monocular 3D object detection across different sensors, environments, and camera setups.
no code implementations • 14 Mar 2025 • Ziwei Shi, Xiaoran Zhang, Yan Xia, Yu Zang, Siqi Shen, Cheng Wang
To overcome this, we first construct XA-L&RSI dataset, which encompasses approximately $110, 000$ remote sensing submaps and $13, 000$ LiDAR point cloud submaps captured in urban scenes, and propose a novel method, L2RSI, for cross-view LiDAR place recognition using high-resolution Remote Sensing Imagery.
no code implementations • 9 Mar 2025 • Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, Alois Knoll
By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation within 3D space, facilitating a more precise representation of dynamic scenes.
no code implementations • 7 Mar 2025 • Guanghao Zhang, Tao Zhong, Yan Xia, Zhelun Yu, Haoyuan Li, Wanggui He, Fangxun Shu, Mushui Liu, Dong She, Yi Wang, Hao Jiang
The construction of interleaved multimodal multi-step reasoning chains, which utilize critical visual region tokens, extracted from intermediate reasoning steps, as supervisory signals.
1 code implementation • 20 Feb 2025 • Minjie Hong, Yan Xia, Zehan Wang, Jieming Zhu, Ye Wang, Sihang Cai, Xiaoda Yang, Quanyu Dai, Zhenhua Dong, Zhimeng Zhang, Zhou Zhao
Large language models (LLMs) are increasingly leveraged as foundational backbones in the development of advanced recommender systems, offering enhanced capabilities through their extensive knowledge and reasoning.
1 code implementation • 20 Feb 2025 • Thomas Froech, Olaf Wysocki, Yan Xia, Junyu Xie, Benedikt Schwab, Daniel Cremers, Thomas H. Kolbe
To address this challenge, we introduce FacaDiffy, a novel method for inpainting unseen facade parts by completing conflict maps with a personalized Stable Diffusion model.
1 code implementation • 17 Feb 2025 • Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei
The advent of 1-bit large language models (LLMs), led by BitNet b1. 58, has spurred interest in ternary LLMs.
1 code implementation • 13 Jan 2025 • Chengzu Li, Wenshan Wu, Huanyu Zhang, Yan Xia, Shaoguang Mao, Li Dong, Ivan Vulić, Furu Wei
Ultimately, MVoT establishes new possibilities for complex reasoning tasks where visual thinking can effectively complement verbal reasoning.
no code implementations • 26 Dec 2024 • Hai Huang, Shulei Wang, Yan Xia
Recent research in the domain of multimodal unified representations predominantly employs codebook as representation forms, utilizing Vector Quantization(VQ) for quantization, yet there has been insufficient exploration of other quantization representation forms.
no code implementations • 16 Dec 2024 • Yan Xia, Zhendong Li, Yun-Jin Li, Letian Shi, Hu Cao, João F. Henriques, Daniel Cremers
To date, most place recognition methods focus on single-modality retrieval.
no code implementations • 13 Dec 2024 • Yan Xia, Yunxiang Lu, Rui Song, Oussema Dhaouadi, João F. Henriques, Daniel Cremers
To overcome the lack of large-scale real-world intersection datasets, we introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla.
1 code implementation • 28 Nov 2024 • Yun-Jin Li, Mariia Gladkova, Yan Xia, Daniel Cremers
We introduce SADG, Segment Any Dynamic Gaussian Without Object Trackers, a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs.
no code implementations • 14 Nov 2024 • Soumick Chatterjee, Hendrik Mattern, Marc Dörner, Alessandro Sciarra, Florian Dubost, Hannes Schnurre, Rupali Khatun, Chun-Chih Yu, Tsung-Lin Hsieh, Yi-Shan Tsai, Yi-Zeng Fang, Yung-Ching Yang, Juinn-Dar Huang, Marshall Xu, Siyu Liu, Fernanda L. Ribeiro, Saskia Bollmann, Karthikesh Varma Chintalapati, Chethan Mysuru Radhakrishna, Sri Chandana Hudukula Ram Kumara, Raviteja Sutrave, Abdul Qayyum, Moona Mazher, Imran Razzak, Cristobal Rodero, Steven Niederren, Fengming Lin, Yan Xia, Jiacheng Wang, Riyu Qiu, Liansheng Wang, Arya Yazdan Panah, Rosana El Jurdi, Guanghui Fu, Janan Arslan, Ghislain Vaillant, Romain Valabregue, Didier Dormont, Bruno Stankoff, Olivier Colliot, Luisa Vargas, Isai Daniel Chacón, Ioannis Pitsiorlas, Pablo Arbeláez, Maria A. Zuluaga, Stefanie Schreiber, Oliver Speck, Andreas Nürnberger
The human brain receives nutrients and oxygen through an intricate network of blood vessels.
1 code implementation • 7 Nov 2024 • Olaf Wysocki, Yue Tan, Thomas Froech, Yan Xia, Magdalena Wysocki, Ludwig Hoegner, Daniel Cremers, Christoph Holst
Facade semantic segmentation is a long-standing challenge in photogrammetry and computer vision.
1 code implementation • 21 Oct 2024 • Jinheng Wang, Hansong Zhou, Ting Song, Shaoguang Mao, Shuming Ma, Hongyu Wang, Yan Xia, Furu Wei
Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1. 58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption.
1 code implementation • 29 Sep 2024 • Nuowei Liu, Xinhao Chen, Hongyi Wu, Changzhi Sun, Man Lan, Yuanbin Wu, Xiaopeng Bai, Shaoguang Mao, Yan Xia
Existing rhetorical understanding and generation datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories, neglecting the common interrelations between different rhetorical devices by treating them as independent sub-tasks.
no code implementations • 10 Sep 2024 • Zehong Shen, Huaijin Pi, Yan Xia, Zhi Cen, Sida Peng, Zechen Hu, Hujun Bao, Ruizhen Hu, Xiaowei Zhou
Instead, we propose estimating human poses in a novel Gravity-View (GV) coordinate system, which is defined by the world gravity and the camera view direction.
no code implementations • 31 Aug 2024 • Jialiang Wang, Yan Xia, Ye Yuan
A second-order-based latent factor (SLF) analysis model demonstrates superior performance in graph representation learning, particularly for high-dimensional and incomplete (HDI) interaction data, by incorporating the curvature information of the loss landscape.
1 code implementation • 7 Aug 2024 • Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, Cheng Wang
However, the fusion of LiDAR and 4D radar is challenging because they differ significantly in terms of data quality and the degree of degradation in adverse weather.
1 code implementation • 17 Jul 2024 • Hu Cao, Zehua Zhang, Yan Xia, Xinyi Li, Jiahao Xia, Guang Chen, Alois Knoll
The core concept is the design of the coarse-to-fine fusion module, denoted as the cross-modality adaptive feature refinement (CAFR) module.
Ranked #1 on
Object Detection
on PKU-DDD17-Car
no code implementations • 11 Jul 2024 • Yu Feng, Yiming Xu, Yan Xia, Claus Brenner, Monika Sester
In this study, we present a novel approach that leverages deep neural networks to learn a model capable of filling gaps in urban scenes that are obscured by vehicle occlusion.
no code implementations • 8 Jul 2024 • Yadong Zhang, Shaoguang Mao, Wenshan Wu, Yan Xia, Tao Ge, Man Lan, Furu Wei
This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models.
no code implementations • 8 Jul 2024 • Yan Xia, Ran Ding, Ziyuan Qin, Guanqi Zhan, Kaichen Zhou, Long Yang, Hao Dong, Daniel Cremers
3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world.
no code implementations • 25 Jun 2024 • Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao
Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries.
1 code implementation • 20 Jun 2024 • Ye Wang, Jiahao Xun, Minjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, Zhenhua Dong
Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem.
no code implementations • 17 Jun 2024 • Peizhong Gao, Ao Xie, Shaoguang Mao, Wenshan Wu, Yan Xia, Haipeng Mi, Furu Wei
MRP represents a significant advancement in enabling LLMs to identify cognitive challenges across problems and leverage benefits across different reasoning approaches, enhancing their ability to handle diverse and complex problem domains efficiently.
no code implementations • CVPR 2025 • Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma, Yan Xia, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu
Localizing events in videos based on semantic queries is a pivotal task in video understanding, with the growing significance of user-oriented applications like video search.
1 code implementation • 4 Apr 2024 • Wenshan Wu, Shaoguang Mao, Yadong Zhang, Yan Xia, Li Dong, Lei Cui, Furu Wei
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks.
no code implementations • 1 Apr 2024 • Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei
This paper presents a comprehensive survey of the current status and opportunities for Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning that necessitates understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.
1 code implementation • 21 Mar 2024 • Yun-Jin Li, Mariia Gladkova, Yan Xia, Rui Wang, Daniel Cremers
To tackle this issue, we propose Voxel-Cross-Pixel (VXP), a novel camera-to-LiDAR place recognition framework that enforces local similarities in a self-supervised manner and effectively brings global context from images and LiDAR scans into a shared feature space.
no code implementations • 8 Mar 2024 • Hai Huang, Yan Xia, Shengpeng Ji, Shulei Wang, Hanting Wang, Minghui Fang, Jieming Zhu, Zhenhua Dong, Sashuai Zhou, Zhou Zhao
To enhance the interpretability of multimodal unified representations, many studies have focused on discrete unified representations.
1 code implementation • 23 Feb 2024 • Fengming Lin, Yan Xia, Michael MacRaild, Yash Deo, Haoran Dou, Qiongyao Liu, Nina Cheng, Nishant Ravikumar, Alejandro F. Frangi
The automated segmentation of cerebral aneurysms is pivotal for accurate diagnosis and treatment planning.
1 code implementation • 23 Feb 2024 • Fengming Lin, Yan Xia, Michael MacRaild, Yash Deo, Haoran Dou, Qiongyao Liu, Kun Wu, Nishant Ravikumar, Alejandro F. Frangi
Unsupervised domain adaptation (UDA) aims to align the labelled source distribution with the unlabelled target distribution to obtain domain-invariant predictive models.
no code implementations • 2 Feb 2024 • Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei
LLMs and LLM agents often struggle with strategic reasoning due to the absence of a reasoning framework that enables them to dynamically infer others' perspectives and adapt to changing environments.
no code implementations • 21 Dec 2023 • Haifeng Huang, Yang Zhao, Zehan Wang, Yan Xia, Zhou Zhao
Thus, to address this issue and enhance model performance on new scenes, we explore the TVG task in an unsupervised domain adaptation (UDA) setting across scenes for the first time, where the video-query pairs in the source scene (domain) are labeled with temporal boundaries, while those in the target scene are not.
1 code implementation • 17 Dec 2023 • Yu Zhang, Rongjie Huang, RuiQi Li, Jinzheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao
Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase.
1 code implementation • CVPR 2024 • Yan Xia, Letian Shi, Zifeng Ding, João F. Henriques, Daniel Cremers
We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text.
Ranked #2 on
Visual Place Recognition
on KITTI360pose
1 code implementation • 22 Nov 2023 • Nicolás Gaggion, Benjamin A. Matheson, Yan Xia, Rodrigo Bonazzola, Nishant Ravikumar, Zeike A. Taylor, Diego H. Milone, Alejandro F. Frangi, Enzo Ferrante
In response, we introduce HybridVNet, a novel architecture for direct image-to-mesh extraction seamlessly integrating standard convolutional neural networks with graph convolutions, which we prove can efficiently handle surface and volumetric meshes by encoding them as graph structures.
1 code implementation • NeurIPS 2023 • Haoyi Duan, Yan Xia, Mingze Zhou, Li Tang, Jieming Zhu, Zhou Zhao
This mechanism leverages audio and visual modalities as soft prompts to dynamically adjust the parameters of pre-trained models based on the current multi-modal input features.
1 code implementation • 6 Nov 2023 • Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, Furu Wei
This paper introduces Alympics (Olympics for Agents), a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research.
no code implementations • 12 Oct 2023 • Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan
In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner.
no code implementations • 12 Sep 2023 • Yiming Shan, Yan Xia, Yuhong Chen, Daniel Cremers
In this paper, we propose a Scene Completion Pre-training (SCP) method to enhance the performance of 3D object detectors with less labeled data.
no code implementations • 24 Aug 2023 • Yash Deo, Rodrigo Bonazzola, Haoran Dou, Yan Xia, Tianyou Wei, Nishant Ravikumar, Alejandro F. Frangi, Toni Lassila
We present an encoder-decoder model for synthesising segmentations of the main cerebral arteries in the circle of Willis (CoW) from only T2 MRI.
1 code implementation • 7 Aug 2023 • Fengming Lin, Yan Xia, Nishant Ravikumar, Qiongyao Liu, Michael MacRaild, Alejandro F Frangi
Accurate segmentation of brain vessels is crucial for cerebrovascular disease diagnosis and treatment.
1 code implementation • 14 Jul 2023 • Zifeng Ding, Jingcheng Wu, Jingpei Wu, Yan Xia, Volker Tresp
We develop two new benchmark HTKG datasets, i. e., Wiki-hy and YAGO-hy, and propose an HTKG reasoning model that efficiently models hyper-relational temporal facts.
no code implementations • 8 Jun 2023 • Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia, Yan Deng, Jonathan Tien
To leverage NLP models, speech input is first force-aligned with texts, and then pre-processed into a token sequence, including words and phrase break information.
1 code implementation • 17 May 2023 • Chenshuo Wang, Shaoguang Mao, Tao Ge, Wenshan Wu, Xun Wang, Yan Xia, Jonathan Tien, Dongyan Zhao
The training dataset comprises over 3. 7 million sentences and 12. 7 million suggestions generated through rules.
1 code implementation • 11 May 2023 • Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, Furu Wei
Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages.
1 code implementation • 10 May 2023 • Olaf Wysocki, Yan Xia, Magdalena Wysocki, Eleonora Grilli, Ludwig Hoegner, Daniel Cremers, Uwe Stilla
To this end, we leverage laser physics and 3D building model priors to probabilistically identify model conflicts.
2 code implementations • 17 Apr 2023 • Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei
By introducing this framework, we aim to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks.
no code implementations • 29 Mar 2023 • Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan
On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well.
1 code implementation • Computer Methods and Programs in Biomedicine 2023 • Fengming Lin, Yan Xia, Shuang Song, Nishant Ravikumar, Alejandro F Frangi
Results:On the internal clinical dataset, our method consistently outperformed several state-of-the-art approaches for vessel and aneurysm segmentation, achieving an average Dice score of 0. 81 (0. 15 higher than nnUNet) and an average surface-to-surface error of 0. 20 mm (less than the in-plane resolution (0. 35 mm/pixel)) for aneurysm segmentation; and an average Dice score of 0. 91 and average surface-to-surface error of 0. 25 mm for vessel segmentation.
no code implementations • 7 Jan 2023 • Rodrigo Bonazzola, Enzo Ferrante, Nishant Ravikumar, Yan Xia, Bernard Keavney, Sven Plein, Tanveer Syeda-Mahmood, Alejandro F Frangi
Here, we propose a new framework for gene discovery entitled Unsupervised Phenotype Ensembles (UPE).
no code implementations • NeurIPS 2023 • Tao Ge, Jing Hu, Li Dong, Shaoguang Mao, Yan Xia, Xun Wang, Si-Qing Chen, Furu Wei
We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL).
1 code implementation • 24 Nov 2022 • Xiang Chen, Yan Xia, Nishant Ravikumar, Alejandro F Frangi
In such scenarios, enforcing smooth, globally continuous deformation fields leads to incorrect/implausible registration results.
1 code implementation • ICCV 2023 • Yan Xia, Mariia Gladkova, Rui Wang, Qianyun Li, Uwe Stilla, João F. Henriques, Daniel Cremers
CASSPR uses queries from one branch to try to match structures in the other branch, ensuring that both extract self-contained descriptors of the point cloud (rather than one branch dominating), but using both to inform the output global descriptor of the point cloud.
no code implementations • 28 Oct 2022 • Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia
The token sequence is then fed into the pre-training and fine-tuning pipeline.
1 code implementation • 1 Sep 2022 • Yan Xia, Zhou Zhao, Shangwei Ye, Yang Zhao, Haoyuan Li, Yi Ren
To rectify the discriminative phonemes and extract video-related information from noisy audio, we develop a novel video-guided curriculum learning (VGCL) during the audio pre-training process, which can make use of the vital visual perceptions to help understand the spoken language and suppress the external noise.
1 code implementation • 8 Mar 2022 • Yan Xia, Qiangqiang Wu, Wei Li, Antoni B. Chan, Uwe Stilla
Recent works on 3D single object tracking treat the task as a target-specific 3D detection task, where an off-the-shelf 3D detector is commonly employed for the tracking.
1 code implementation • CVPR 2022 • Yan Xia, Zhou Zhao
Audiovisual Event (AVE) localization requires the model to jointly localize an event by observing audio and visual information.
no code implementations • 14 Oct 2021 • Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu
These embeddings, when used as implicit phonetic supplementary information, can alleviate the data shortage of explicit phoneme annotations.
no code implementations • 1 Oct 2021 • Yan Xia, Linhui Jiang, Lu Wang, Xue Chen, Jianjie Ye, Tangyan Hou, Liqiang Wang, Yibo Zhang, Mengying Li, Zhen Li, Zhe Song, Yaping Jiang, Weiping Liu, Pengfei Li, Daniel Rosenfeld, John H. Seinfeld, Shaocai Yu
Our results show that the ORRS measurements, assisted by the machine-learning-based ensemble model developed here, can realize day-to-day supervision of on-road vehicle-specific emissions.
1 code implementation • 9 Jul 2021 • Xiang Chen, Nishant Ravikumar, Yan Xia, Alejandro F Frangi
Image registration aims to establish spatial correspondence across pairs, or groups of images, and is a cornerstone of medical image computing and computer-assisted-interventions.
no code implementations • 11 Jun 2021 • Xiang Chen, Yan Xia, Nishant Ravikumar, Alejandro F Frangi
Image registration is a fundamental building block for various applications in medical image analysis.
no code implementations • 13 May 2021 • Alim Virani, Jay Baxter, Dan Shiebler, Philip Gautier, Shivam Verma, Yan Xia, Apoorv Sharma, Sumit Binnani, LinLin Chen, Chenguang Yu
Traditionally, heuristic methods are used to generate candidates for large scale recommender systems.
1 code implementation • 19 Apr 2021 • Yaqi Xia, Yan Xia, Wei Li, Rui Song, Kailang Cao, Uwe Stilla
We tackle the problem of object completion from point clouds and propose a novel point cloud completion network employing an Asymmetrical Siamese Feature Matching strategy, termed as ASFM-Net.
1 code implementation • CVPR 2021 • Yan Xia, Yusheng Xu, Shuang Li, Rui Wang, Juan Du, Daniel Cremers, Uwe Stilla
We tackle the problem of place recognition from point cloud data and introduce a self-attention and orientation encoding network (SOE-Net) that fully explores the relationship between points and incorporates long-range context into point-wise local descriptors.
Ranked #5 on
3D Place Recognition
on Oxford RobotCar Dataset
(AR@1% metric)
no code implementations • 26 Oct 2020 • Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu
Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations.
1 code implementation • 8 Aug 2020 • Yan Xia, Yusheng Xu, Cheng Wang, Uwe Stilla
Moreover, a new refiner module is also presented to preserve the vehicle details from inputs and refine the complete outputs with fine-grained information.
1 code implementation • 8 Sep 2018 • Yan Xia, Yang Zhang, Dingfu Zhou, Xinyu Huang, Cheng Wang, Ruigang Yang
Then, the image together with the retrieved shape model is fed into the proposed network to generate the fine-grained 3D point cloud.
no code implementations • 3 Jan 2017 • Xiaolin Huang, Yan Xia, Lei Shi, Yixing Huang, Ming Yan, Joachim Hornegger, Andreas Maier
Aiming at overexposure correction for computed tomography (CT) reconstruction, we in this paper propose a mixed one-bit compressive sensing (M1bit-CS) to acquire information from both regular and saturated measurements.
no code implementations • ICCV 2015 • Yan Xia, Xudong Cao, Fang Wen, Gang Hua, Jian Sun
We study the problem of automatically removing outliers from noisy data, with application for removing outlier images from an image collection.
no code implementations • CVPR 2015 • Yan Xia, Kaiming He, Pushmeet Kohli, Jian Sun
This paper addresses the problem of learning long binary codes from high-dimensional data.