Search Results for author: Yan Xia

Found 45 papers, 20 papers with code

Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

1 code implementation4 Apr 2024 Wenshan Wu, Shaoguang Mao, Yadong Zhang, Yan Xia, Li Dong, Lei Cui, Furu Wei

Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks.

Visual Navigation

LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models

no code implementations1 Apr 2024 Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei

This paper presents a comprehensive survey of the current status and opportunities for Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning that necessitates understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.

Decision Making

VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition

no code implementations21 Mar 2024 Yun-Jin Li, Mariia Gladkova, Yan Xia, Rui Wang, Daniel Cremers

Recent works on the global place recognition treat the task as a retrieval problem, where an off-the-shelf global descriptor is commonly designed in image-based and LiDAR-based modalities.

Cross-Modal Retrieval Retrieval

Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment

1 code implementation8 Mar 2024 Hai Huang, Yan Xia, Shengpeng Ji, Shulei Wang, Hanting Wang, Jieming Zhu, Zhenhua Dong, Zhou Zhao

The Dual Cross-modal Information Disentanglement (DCID) model, utilizing a unified codebook, shows promising results in achieving fine-grained representation and cross-modal generalization.

Disentanglement

Unsupervised Domain Adaptation for Brain Vessel Segmentation through Transwarp Contrastive Learning

1 code implementation23 Feb 2024 Fengming Lin, Yan Xia, Michael MacRaild, Yash Deo, Haoran Dou, Qiongyao Liu, Kun Wu, Nishant Ravikumar, Alejandro F. Frangi

Unsupervised domain adaptation (UDA) aims to align the labelled source distribution with the unlabelled target distribution to obtain domain-invariant predictive models.

Contrastive Learning Unsupervised Domain Adaptation

K-Level Reasoning with Large Language Models

no code implementations2 Feb 2024 Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Man Lan, Furu Wei

While Large Language Models (LLMs) have demonstrated their proficiency in complex reasoning tasks, their performance in dynamic, interactive, and competitive scenarios - such as business strategy and stock market analysis - remains underexplored.

Decision Making

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding

no code implementations21 Dec 2023 Haifeng Huang, Yang Zhao, Zehan Wang, Yan Xia, Zhou Zhao

Thus, to address this issue and enhance model performance on new scenes, we explore the TVG task in an unsupervised domain adaptation (UDA) setting across scenes for the first time, where the video-query pairs in the source scene (domain) are labeled with temporal boundaries, while those in the target scene are not.

Unsupervised Domain Adaptation Video Grounding

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

no code implementations17 Dec 2023 Yu Zhang, Rongjie Huang, RuiQi Li, Jinzheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase.

Quantization Singing Voice Synthesis +1

Text2Loc: 3D Point Cloud Localization from Natural Language

no code implementations27 Nov 2023 Yan Xia, Letian Shi, Zifeng Ding, João F. Henriques, Daniel Cremers

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text.

Contrastive Learning

Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks

1 code implementation NeurIPS 2023 Haoyi Duan, Yan Xia, Mingze Zhou, Li Tang, Jieming Zhu, Zhou Zhao

This mechanism leverages audio and visual modalities as soft prompts to dynamically adjust the parameters of pre-trained models based on the current multi-modal input features.

ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI Agents

1 code implementation6 Nov 2023 Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, Furu Wei

This paper introduces Alympics (Olympics for Agents), a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research.

Decision Making Language Modelling +1

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

no code implementations12 Oct 2023 Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan

In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner.

In-Context Learning Text Generation

SCP: Scene Completion Pre-training for 3D Object Detection

no code implementations12 Sep 2023 Yiming Shan, Yan Xia, Yuhong Chen, Daniel Cremers

In this paper, we propose a Scene Completion Pre-training (SCP) method to enhance the performance of 3D object detectors with less labeled data.

3D Object Detection Autonomous Driving +2

Learned Local Attention Maps for Synthesising Vessel Segmentations

no code implementations24 Aug 2023 Yash Deo, Rodrigo Bonazzola, Haoran Dou, Yan Xia, Tianyou Wei, Nishant Ravikumar, Alejandro F. Frangi, Toni Lassila

We present an encoder-decoder model for synthesising segmentations of the main cerebral arteries in the circle of Willis (CoW) from only T2 MRI.

Exploring Link Prediction over Hyper-Relational Temporal Knowledge Graphs Enhanced with Time-Invariant Relational Knowledge

no code implementations14 Jul 2023 Zifeng Ding, Jingcheng Wu, Jingpei Wu, Yan Xia, Volker Tresp

We develop two new benchmark hyper-relational TKG (HTKG) datasets, i. e., Wiki-hy and YAGO-hy, and propose an HTKG reasoning model that efficiently models both temporal facts and qualifiers.

Knowledge Graphs Link Prediction +1

Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models

no code implementations8 Jun 2023 Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia, Yan Deng, Jonathan Tien

To leverage NLP models, speech input is first force-aligned with texts, and then pre-processed into a token sequence, including words and phrase break information.

text-classification Text Classification

Smart Word Suggestions for Writing Assistance

1 code implementation17 May 2023 Chenshuo Wang, Shaoguang Mao, Tao Ge, Wenshan Wu, Xun Wang, Yan Xia, Jonathan Tien, Dongyan Zhao

The training dataset comprises over 3. 7 million sentences and 12. 7 million suggestions generated through rules.

Low-code LLM: Graphical User Interface over Large Language Models

2 code implementations17 Apr 2023 Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei

By introducing this framework, we aim to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks.

Prompt Engineering

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

no code implementations29 Mar 2023 Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well.

Code Generation Common Sense Reasoning +1

CASSPR: Cross Attention Single Scan Place Recognition

1 code implementation ICCV 2023 Yan Xia, Mariia Gladkova, Rui Wang, Qianyun Li, Uwe Stilla, João F. Henriques, Daniel Cremers

CASSPR uses queries from one branch to try to match structures in the other branch, ensuring that both extract self-contained descriptors of the point cloud (rather than one branch dominating), but using both to inform the output global descriptor of the point cloud.

Video-Guided Curriculum Learning for Spoken Video Grounding

1 code implementation1 Sep 2022 Yan Xia, Zhou Zhao, Shangwei Ye, Yang Zhao, Haoyuan Li, Yi Ren

To rectify the discriminative phonemes and extract video-related information from noisy audio, we develop a novel video-guided curriculum learning (VGCL) during the audio pre-training process, which can make use of the vital visual perceptions to help understand the spoken language and suppress the external noise.

Video Grounding

A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds

1 code implementation8 Mar 2022 Yan Xia, Qiangqiang Wu, Wei Li, Antoni B. Chan, Uwe Stilla

Recent works on 3D single object tracking treat the task as a target-specific 3D detection task, where an off-the-shelf 3D detector is commonly employed for the tracking.

3D Single Object Tracking motion prediction +1

Cross-Modal Background Suppression for Audio-Visual Event Localization

1 code implementation CVPR 2022 Yan Xia, Zhou Zhao

Audiovisual Event (AVE) localization requires the model to jointly localize an event by observing audio and visual information.

audio-visual event localization

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

no code implementations14 Oct 2021 Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu

These embeddings, when used as implicit phonetic supplementary information, can alleviate the data shortage of explicit phoneme annotations.

Rapid Assessments of Light-Duty Gasoline Vehicle Emissions Using On-Road Remote Sensing and Machine Learning

no code implementations1 Oct 2021 Yan Xia, Linhui Jiang, Lu Wang, Xue Chen, Jianjie Ye, Tangyan Hou, Liqiang Wang, Yibo Zhang, Mengying Li, Zhen Li, Zhe Song, Yaping Jiang, Weiping Liu, Pengfei Li, Daniel Rosenfeld, John H. Seinfeld, Shaocai Yu

Our results show that the ORRS measurements, assisted by the machine-learning-based ensemble model developed here, can realize day-to-day supervision of on-road vehicle-specific emissions.

A Deep Discontinuity-Preserving Image Registration Network

1 code implementation9 Jul 2021 Xiang Chen, Nishant Ravikumar, Yan Xia, Alejandro F Frangi

Image registration aims to establish spatial correspondence across pairs, or groups of images, and is a cornerstone of medical image computing and computer-assisted-interventions.

Image Registration Medical Image Registration +1

ASFM-Net: Asymmetrical Siamese Feature Matching Network for Point Completion

1 code implementation19 Apr 2021 Yaqi Xia, Yan Xia, Wei Li, Rui Song, Kailang Cao, Uwe Stilla

We tackle the problem of object completion from point clouds and propose a novel point cloud completion network employing an Asymmetrical Siamese Feature Matching strategy, termed as ASFM-Net.

Point Cloud Completion

SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud based Place Recognition

1 code implementation CVPR 2021 Yan Xia, Yusheng Xu, Shuang Li, Rui Wang, Juan Du, Daniel Cremers, Uwe Stilla

We tackle the problem of place recognition from point cloud data and introduce a self-attention and orientation encoding network (SOE-Net) that fully explores the relationship between points and incorporates long-range context into point-wise local descriptors.

3D Place Recognition Metric Learning +1

Improving pronunciation assessment via ordinal regression with anchored reference samples

no code implementations26 Oct 2020 Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu

Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations.

regression Sentence

VPC-Net: Completion of 3D Vehicles from MLS Point Clouds

1 code implementation8 Aug 2020 Yan Xia, Yusheng Xu, Cheng Wang, Uwe Stilla

Moreover, a new refiner module is also presented to preserve the vehicle details from inputs and refine the complete outputs with fine-grained information.

Autonomous Driving

RealPoint3D: Point Cloud Generation from a Single Image with Complex Background

1 code implementation8 Sep 2018 Yan Xia, Yang Zhang, Dingfu Zhou, Xinyu Huang, Cheng Wang, Ruigang Yang

Then, the image together with the retrieved shape model is fed into the proposed network to generate the fine-grained 3D point cloud.

3D Generation Point Cloud Generation

Mixed one-bit compressive sensing with applications to overexposure correction for CT reconstruction

no code implementations3 Jan 2017 Xiaolin Huang, Yan Xia, Lei Shi, Yixing Huang, Ming Yan, Joachim Hornegger, Andreas Maier

Aiming at overexposure correction for computed tomography (CT) reconstruction, we in this paper propose a mixed one-bit compressive sensing (M1bit-CS) to acquire information from both regular and saturated measurements.

Compressive Sensing Computed Tomography (CT) +1

Learning Discriminative Reconstructions for Unsupervised Outlier Removal

no code implementations ICCV 2015 Yan Xia, Xudong Cao, Fang Wen, Gang Hua, Jian Sun

We study the problem of automatically removing outliers from noisy data, with application for removing outlier images from an image collection.

Cannot find the paper you are looking for? You can Submit a new open access paper.