Search Results for author: Xi Wang

Found 134 papers, 36 papers with code

Improving Embedding-based Large-scale Retrieval via Label Enhancement

no code implementations Findings (EMNLP) 2021 Peiyang Liu, Xi Wang, Sen Wang, Wei Ye, Xiangyu Xi, Shikun Zhang

Current embedding-based large-scale retrieval models are trained with 0-1 hard label that indicates whether a query is relevant to a document, ignoring rich information of the relevance degree.

Retrieval

Look a Group at Once: Multi-Slide Modeling for Survival Prediction

no code implementations18 Nov 2024 Xinyang Li, Yi Zhang, Yi Xie, Jianfei Yang, Xi Wang, Hao Chen, Haixian Zhang

In this paper, we introduce GroupMIL, a novel framework inspired by the clinical practice of collective analysis, which models multiple slides as a single sample and organizes groups of patches and slides sequentially to capture cross-slide prognostic features.

Survival Prediction

Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs

no code implementations19 Oct 2024 Xiaocheng Zhang, Xi Wang, Yifei Lu, Zhuangzhuang Ye, Jianing Wang, Mengjiao Bao, Peng Yan, Xiaohong Su

However, previous studies on explanation generation has shown several limitations, such as being confined to English scenarios, involving overly complex inference processes, and not fully unleashing the potential of the mutual feedback between veracity labels and explanation texts.

Explanation Generation Fact Checking +1

LEAD: Latent Realignment for Human Motion Diffusion

no code implementations18 Oct 2024 Nefeli Andreou, Xi Wang, Victoria Fernández Abrevaya, Marie-Paule Cani, Yiorgos Chrysanthou, Vicky Kalogeiton

Here, we address this by combining latent diffusion with a realignment mechanism, producing a novel, semantically structured space that encodes the semantics of language.

Diversity Motion Synthesis

Self-Supervised Scene Flow Estimation with Point-Voxel Fusion and Surface Representation

no code implementations17 Oct 2024 Xuezhi Xiang, Xi Wang, Lei Zhang, Denis Ombati, Himaloy Himu, XianTong Zhen

Scene flow estimation aims to generate the 3D motion field of points between two consecutive frames of point clouds, which has wide applications in various fields.

Self-supervised Scene Flow Estimation

KBLaM: Knowledge Base augmented Language Model

no code implementations14 Oct 2024 Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman

In this paper, we propose Knowledge Base augmented Language Model (KBLaM), a new method for augmenting Large Language Models (LLMs) with external knowledge.

8k In-Context Learning +4

Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking

no code implementations24 Sep 2024 Xi Wang, Tianxing Chen, Qiaojun Yu, Tianling Xu, Zanxin Chen, Yiting Fu, Cewu Lu, Yao Mu, Ping Luo

To address this limitation, we present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.

Object

MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation

no code implementations23 Sep 2024 Junqing He, Liang Zhu, Rui Wang, Xi Wang, Reza Haffari, Jiaxing Zhang

Long-term memory is important for chatbots and dialogue systems (DS) to create consistent and human-like conversations, evidenced by numerous developed memory-augmented DS (MADS).

Dialogue Generation Retrieval

Learning to Discover Forgery Cues for Face Forgery Detection

no code implementations2 Sep 2024 Jiahe Tian, Peng Chen, Cai Yu, Xiaomeng Fu, Xi Wang, Jiao Dai, Jizhong Han

The produced manipulation maps can serve as better supervision to enhance face forgery detectors.

Cloud-Based Federation Framework and Prototype for Open, Scalable, and Shared Access to NextG and IoT Testbeds

no code implementations26 Aug 2024 Maxwell McManus, Tenzin Rinchen, Annoy Dey, Sumanth Thota, Zhaoxi Zhang, Jiangqi Hu, Xi Wang, Mingyue Ji, Nicholas Mastronarde, Elizabeth Serena Bentley, Michael Medley, Zhangyu Guan

In this work, we present a new federation framework for UnionLabs, an innovative cloud-based resource-sharing infrastructure designed for next-generation (NextG) and Internet of Things (IoT) over-the-air (OTA) experiments.

Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

1 code implementation16 Aug 2024 Kaixiang Yang, Wenqi Shan, Xudong Li, Xuan Wang, Xikai Yang, Xi Wang, Pheng-Ann Heng, Qiang Li, Zhiwei Wang

Multi-modal brain tumor segmentation typically involves four magnetic resonance imaging (MRI) modalities, while incomplete modalities significantly degrade performance.

Brain Tumor Segmentation Tumor Segmentation

Spatio-Temporal Communication Compression for Distributed Prime-Dual Optimization

no code implementations14 Aug 2024 Zihao Ren, Lei Wang, Xinlei Yi, Xi Wang, Deming Yuan, Tao Yang, Zhengguang Wu, Guodong Shi

In this paper, we demonstrate that effective information compression may occur over time or space during sequences of node communications in distributed algorithms, leading to the concept of spatio-temporal compressors.

Distributed Optimization

Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs

no code implementations10 Aug 2024 Kexin Ma, Ruochun Jin, Xi Wang, Huan Chen, Jing Ren, Yuhua Tang

Retrieval-Augmented Large Language Models (RALMs) have made significant strides in enhancing the accuracy of generated responses. However, existing research often overlooks the data quality issues within retrieval results, often caused by inaccurate existing vector-distance-based retrieval methods. We propose to boost the precision of RALMs' answers from a data quality perspective through the Context-Driven Index Trimming (CDIT) framework, where Context Matching Dependencies (CMDs) are employed as logical data quality rules to capture and regulate the consistency between retrieved contexts. Based on the semantic comprehension capabilities of Large Language Models (LLMs), CDIT can effectively identify and discard retrieval results that are inconsistent with the query context and further modify indexes in the database, thereby improving answer quality. Experiments demonstrate on challenging question-answering tasks. Also, the flexibility of CDIT is verified through its compatibility with various language models and indexing methods, which offers a promising approach to bolster RALMs' data quality and retrieval precision jointly.

Question Answering Retrieval

Source-Free Domain-Invariant Performance Prediction

no code implementations5 Aug 2024 Ekaterina Khramtsova, Mahsa Baktashmotlagh, Guido Zuccon, Xi Wang, Mathieu Salzmann

In this work, we propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data.

Object Recognition

Adaptive Retrieval-Augmented Generation for Conversational Systems

no code implementations31 Jul 2024 Xi Wang, Procheta Sen, Ruizhe Li, Emine Yilmaz

Despite the success of integrating large language models into the development of conversational systems, many studies have shown the effectiveness of retrieving and augmenting external knowledge for informative responses.

RAG Retrieval

Large Language Model for Verilog Generation with Golden Code Feedback

1 code implementation21 Jul 2024 Ning Wang, Bingkun Yao, Jie zhou, Xi Wang, Zhe Jiang, Nan Guan

Recent advancements in large language models (LLMs) have catalyzed significant interest in the automatic generation of Register-Transfer Level (RTL) code, particularly Verilog, from natural language instructions.

Language Modelling Large Language Model +2

Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models

no code implementations18 Jul 2024 Qiao Li, Xiaomeng Fu, Xi Wang, Jin Liu, Xingyu Gao, Jiao Dai, Jizhong Han

Therefore, in order to judge whether a specific image is utilized as a member of a model's training set, Membership Inference Attack (MIA) is proposed to serve as a tool for privacy protection.

Inference Attack Membership Inference Attack +1

E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation with character awareness

1 code implementation1 Jul 2024 Robin Courant, Nicolas Dufour, Xi Wang, Marc Christie, Vicky Kalogeiton

dataset, we propose a diffusion-based approach, named DIRECTOR, which generates complex camera trajectories from textual captions that describe the relation and synchronisation between the camera and characters.

3D Generation

Towards Synchronous Memorizability and Generalizability with Site-Modulated Diffusion Replay for Cross-Site Continual Segmentation

1 code implementation26 Jun 2024 Dunyuan Xu, Xi Wang, Jingyang Zhang, Pheng-Ann Heng

To achieve this, we create the orientational gradient alignment to ensure memorizability on previous sites, and arbitrary gradient alignment to enhance generalizability on unseen sites.

Continual Learning Domain Generalization +3

SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

no code implementations21 Jun 2024 Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang

We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users.

How to set AdamW's weight decay as you scale model and dataset size

no code implementations22 May 2024 Xi Wang, Laurence Aitchison

This gives critical insights for how to set the weight decay in AdamW, and how the weight decay should scale with model and dataset size.

Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection

1 code implementation30 Apr 2024 Cai Yu, Shan Jia, Xiaomeng Fu, Jin Liu, Jiahe Tian, Jiao Dai, Xi Wang, Siwei Lyu, Jizhong Han

With the rising prevalence of deepfakes, there is a growing interest in developing generalizable detection methods for various types of deepfakes.

Audio-Visual Synchronization DeepFake Detection +1

WANDR: Intention-guided Human Motion Generation

no code implementations CVPR 2024 Markos Diomataris, Nikos Athanasiou, Omid Taheri, Xi Wang, Otmar Hilliges, Michael J. Black

To address this, we introduce WANDR, a data-driven model that takes an avatar's initial pose and a goal's 3D position and generates natural human motions that place the end effector (wrist) on the goal location.

Motion Generation

FilterPrompt: Guiding Image Transfer in Diffusion Models

no code implementations20 Apr 2024 Xi Wang, Yichen Peng, Heng Fang, Haoran Xie, Xi Yang, Chuntao Li

Achieving this requires the effective decoupling of key attributes within the input image data, aiming to get representations accurately.

Feature Correlation

Analysis of Classifier-Free Guidance Weight Schedulers

no code implementations19 Apr 2024 Xi Wang, Nicolas Dufour, Nefeli Andreou, Marie-Paule Cani, Victoria Fernandez Abrevaya, David Picard, Vicky Kalogeiton

Classifier-Free Guidance (CFG) enhances the quality and condition adherence of text-to-image diffusion models.

Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention

no code implementations10 Apr 2024 Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang

We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input.

Action Anticipation Graph Neural Network +2

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos

no code implementations10 Apr 2024 Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang

Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important.

Activity Recognition Gaze Prediction +1

A Directional Diffusion Graph Transformer for Recommendation

no code implementations4 Apr 2024 Zixuan Yi, Xi Wang, Iadh Ounis

To account for and model possible noise in the users' interactions in graph neural recommenders, we propose a novel Diffusion Graph Transformer (DiffGT) model for top-k recommendation.

Denoising Recommendation Systems

I-Design: Personalized LLM Interior Designer

no code implementations3 Apr 2024 Ata Çelen, Guo Han, Konrad Schindler, Luc van Gool, Iro Armeni, Anton Obukhov, Xi Wang

Interior design allows us to be who we are and live how we want - each design is as unique as our distinct personality.

Language Modelling Large Language Model +2

Characteristic AI Agents via Large Language Models

1 code implementation19 Mar 2024 Xi Wang, Hongliang Dai, Shen Gao, Piji Li

In response to this research gap, we create a benchmark for the characteristic AI agents task, including dataset, techniques, and evaluation metrics.

Chatbot

Model Will Tell: Training Membership Inference for Diffusion Models

no code implementations13 Mar 2024 Xiaomeng Fu, Xi Wang, Qiao Li, Jin Liu, Jiao Dai, Jizhong Han

In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model.

Binary Classification

Noise Level Adaptive Diffusion Model for Robust Reconstruction of Accelerated MRI

1 code implementation8 Mar 2024 Shoujin Huang, GuanXiong Luo, Xi Wang, Ziran Chen, Yuwan Wang, Huaishui Yang, Pheng-Ann Heng, Lingyan Zhang, Mengye Lyu

In general, diffusion model-based MRI reconstruction methods incrementally remove artificially added noise while imposing data consistency to reconstruct the underlying images.

Denoising MRI Reconstruction

Batch size invariant Adam

no code implementations29 Feb 2024 Xi Wang, Laurence Aitchison

We propose a batch size invariant version of Adam, for use in large-scale, distributed settings, in which the mini-batch is divided into micro-batches which are distributed among worker nodes.

Joint Beamforming Design and Stream Allocation for Non-Coherent Joint Transmission in Cell-Free MIMO Networks

no code implementations28 Feb 2024 Xi Wang, Xiaotong Zhao, Juncheng Wang, You Li, Qingjiang Shi

We then propose a joint beamforming and linear stream allocation algorithm, termed as RWMMSE-LSA, which yields closed-form updates with linear stream allocation complexity and is guaranteed to converge to the stationary points of the original joint optimization problem.

Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal Learning for Glaucoma Forecasting from Irregular Time Series Images

no code implementations21 Feb 2024 Xikai Yang, Jian Wu, Xi Wang, Yuchen Yuan, Ning Li Wang, Pheng-Ann Heng

Extensive experiments on the Sequential fundus Images for Glaucoma Forecast (SIGF) dataset demonstrate the superiority of the proposed MST-former method, achieving an AUC of 98. 6% for glaucoma forecasting.

Disease Prediction Irregular Time Series +1

Transparent and Scrutable Recommendations Using Natural Language User Profiles

1 code implementation8 Feb 2024 Jerome Ramos, Hossen A. Rahmani, Xi Wang, Xiao Fu, Aldo Lipani

Given the recent advances in Large Language Models (LLMs), we investigate how a properly crafted prompt can be used to summarize a user's preferences from past reviews and recommend items based only on language-based preferences.

Benchmarking Descriptive +1

Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness

1 code implementation2 Feb 2024 Hossein A. Rahmani, Xi Wang, Mohammad Aliannejadi, Mohammadmehdi Naghiaei, Emine Yilmaz

Clarifying questions are an integral component of modern information retrieval systems, directly impacting user satisfaction and overall system performance.

Information Retrieval

SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

1 code implementation31 Jan 2024 Xiaopeng Li, Shasha Li, Shezheng Song, Huijun Liu, Bin Ji, Xi Wang, Jun Ma, Jie Yu, Xiaodong Liu, Jing Wang, Weimin Zhang

In particular, local editing methods, which directly update model parameters, are more suitable for updating a small amount of knowledge.

Model Editing Word Embeddings

An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check Models

no code implementations26 Jan 2024 Xi Wang, Ruoqing Zhao, Hongliang Dai, Piji Li

Chinese Spelling Check (CSC) is a meaningful task in the area of Natural Language Processing (NLP) which aims at detecting spelling errors in Chinese texts and then correcting these errors.

Domain Adaptation Language Modelling +1

Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading

no code implementations17 Jan 2024 Dunyuan Xu, Xi Wang, Jinyue Cai, Pheng-Ann Heng

Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly.

Long-Tail Class Incremental Learning via Independent Sub-prototype Construction

no code implementations CVPR 2024 Xi Wang, Xu Yang, Jie Yin, Kun Wei, Cheng Deng

In this paper we constructed two parallel spaces simultaneously: 1) Sub-prototype space and 2) Reminiscence space to learn robust representations while alleviating forgetfulness.

class-incremental learning Class Incremental Learning +2

Medical Report Generation based on Segment-Enhanced Contrastive Representation Learning

no code implementations26 Dec 2023 Ruoqing Zhao, Xi Wang, Hongliang Dai, Pan Gao, Piji Li

Automated radiology report generation has the potential to improve radiology reporting and alleviate the workload of radiologists.

Contrastive Learning Image Segmentation +4

StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis

no code implementations19 Dec 2023 Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng

Both objective and subjective evaluations demonstrate that our proposed method can effectively improve the naturalness and expressiveness of the synthesized speech in audiobook synthesis especially for the role and out-of-domain scenarios.

Decoder Speech Synthesis

Decoupling Degradation and Content Processing for Adverse Weather Image Restoration

no code implementations8 Dec 2023 Xi Wang, Xueyang Fu, Peng-Tao Jiang, Jie Huang, Mi Zhou, Bo Li, Zheng-Jun Zha

The former facilitates channel-dependent degradation removal operation, allowing the network to tailor responses to various adverse weather types; the latter, by integrating Fourier's global properties into channel-independent content features, enhances network capacity for consistent global content reconstruction.

Image Restoration

PALM: Predicting Actions through Language Models

no code implementations29 Nov 2023 Sanghwan Kim, Daoji Huang, Yongqin Xian, Otmar Hilliges, Luc van Gool, Xi Wang

Traditional methods heavily rely on representation learning that is trained on a large amount of video data.

Action Anticipation Action Recognition +4

Optimizing and Fine-tuning Large Language Model for Urban Renewal

no code implementations27 Nov 2023 Xi Wang, Xianyao Ling, Tom Zhang, Xuecao Li, Shaolan Wang, Zhixing Li, Liang Zhang, Peng Gong

This study demonstrates the effectiveness and superiority of the joint fine-tuning method using Prefix and LoRA for ChatGLM in the urban renewal knowledge QA tasks.

Language Modelling Large Language Model +2

A Social-aware Gaussian Pre-trained Model for Effective Cold-start Recommendation

no code implementations27 Nov 2023 Siwei Liu, Xi Wang, Craig Macdonald, Iadh Ounis

We propose a novel recommendation model, the Social-aware Gaussian Pre-trained model (SGP), which encodes the user social relations and interaction data at the pre-training stage in a Graph Neural Network (GNN).

Graph Neural Network

Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions

1 code implementation20 Nov 2023 Nikola Popovic, Dimitrios Christodoulou, Danda Pani Paudel, Xi Wang, Luc van Gool

In this work, we propose to predict 3D eye gaze from weak supervision of eye semantic segmentation masks and direct supervision of a few 3D gaze vectors.

Semantic Segmentation

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation

1 code implementation25 Oct 2023 Xi Wang, Hossein A. Rahmani, Jiqun Liu, Emine Yilmaz

Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques.

Conversational Recommendation Data Augmentation +3

LoRA ensembles for large language model fine-tuning

no code implementations29 Sep 2023 Xi Wang, Laurence Aitchison, Maja Rudolph

To address these issues, we propose an ensemble approach using Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique.

Language Modelling Large Language Model +2

OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions

no code implementations28 Sep 2023 Jin Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

Other works construct one-to-one mapping between audio signal and head motion sequences, introducing ambiguity correspondences into the mapping since people can behave differently in head motions when speaking the same content.

Talking Head Generation Video Generation

Extragradient Type Methods for Riemannian Variational Inequality Problems

no code implementations25 Sep 2023 Zihao Hu, Guanghui Wang, Xi Wang, Andre Wibisono, Jacob Abernethy, Molei Tao

In the context of Euclidean space, it is established that the last-iterates of both the extragradient (EG) and past extragradient (PEG) methods converge to the solution of monotone variational inequality problems at a rate of $O\left(\frac{1}{\sqrt{T}}\right)$ (Cai et al., 2022).

Selecting which Dense Retriever to use for Zero-Shot Search

no code implementations18 Sep 2023 Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang, Guido Zuccon

We propose the new problem of choosing which dense retrieval model to use when searching on a new collection for which no labels are available, i. e. in a zero-shot setting.

Information Retrieval Retrieval

Large-Scale Automatic Audiobook Creation

no code implementations7 Sep 2023 Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer

In this work, we present a system that can automatically generate high-quality audiobooks from online e-books.

Text to Speech

BluNF: Blueprint Neural Field

no code implementations7 Sep 2023 Robin Courant, Xi Wang, Marc Christie, Vicky Kalogeiton

BluNF provides a robust and user-friendly 2D blueprint, enabling intuitive scene editing.

Novel View Synthesis

MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023

no code implementations6 Sep 2023 Zhihang Xu, Shaofei Zhang, Xi Wang, Jiajun Zhang, Wenning Wei, Lei He, Sheng Zhao

In this paper, we present MuLanTTS, the Microsoft end-to-end neural text-to-speech (TTS) system designed for the Blizzard Challenge 2023.

Speech Synthesis Text to Speech

MFR-Net: Multi-faceted Responsive Listening Head Generation via Denoising Diffusion Model

no code implementations31 Aug 2023 Jin Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

Responsive listening head generation is an important task that aims to model face-to-face communication scenarios by generating a listener head video given a speaker video and a listener head image.

Denoising Diversity

Bayesian Low-rank Adaptation for Large Language Models

2 code implementations24 Aug 2023 Adam X. Yang, Maxime Robeyns, Xi Wang, Laurence Aitchison

Low-rank adaptation (LoRA) has emerged as a new paradigm for cost-efficient fine-tuning of large language models (LLMs).

Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work

no code implementations9 Jul 2023 Somin Park, Xi Wang, Carol C. Menassa, Vineet R. Kamat, Joyce Y. Chai

When introducing HRC in construction, it is critical to recognize the importance of teamwork and supervision in field construction and establish a natural and intuitive communication system for the human workers and robotic assistants.

Language Modelling Natural Language Understanding +1

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

no code implementations3 Jul 2023 Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee

Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency.

Sentence Text to Speech

Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023

1 code implementation28 Jun 2023 Daoji Huang, Otmar Hilliges, Luc van Gool, Xi Wang

We present Palm, a solution to the Long-Term Action Anticipation (LTA) task utilizing vision-language and large language models.

Action Anticipation Image Captioning +3

Online Distillation for Pseudo-Relevance Feedback

no code implementations16 Jun 2023 Sean MacAvaney, Xi Wang

Model distillation has emerged as a prominent technique to improve neural search models.

Re-Ranking Retrieval

Self Contrastive Learning for Session-based Recommendation

1 code implementation2 Jun 2023 Zhengxiang Shi, Xi Wang, Aldo Lipani

Session-based recommendation, which aims to predict the next item of users' interest as per an existing sequence interaction of items, has attracted growing applications of Contrastive Learning (CL) with improved user and item representations.

Contrastive Learning Data Augmentation +1

A Survey on Asking Clarification Questions Datasets in Conversational Systems

1 code implementation25 May 2023 Hossein A. Rahmani, Xi Wang, Yue Feng, Qiang Zhang, Emine Yilmaz, Aldo Lipani

The ability to understand a user's underlying needs is critical for conversational systems, especially with limited input from users in a conversation.

EFE: End-to-end Frame-to-Gaze Estimation

no code implementations9 May 2023 Haldun Balim, Seonwook Park, Xi Wang, Xucong Zhang, Otmar Hilliges

In this paper, we propose a frame-to-gaze network that directly predicts both 3D gaze origin and 3D gaze direction from the raw frame out of the camera without any face or eye cropping.

Gaze Estimation

FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions

no code implementations31 Mar 2023 Jin Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

Specifically, the head pose prediction module is designed to generate head pose sequences from the source face and driving audio.

Diversity Pose Prediction +2

JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields

1 code implementation CVPR 2023 Xi Wang, Robin Courant, Jinglei Shi, Eric Marchand, Marc Christie

This paper presents JAWS, an optimization-driven approach that achieves the robust transfer of visual cinematic features from a reference in-the-wild video clip to a newly generated clip.

SpatialFormer: Semantic and Target Aware Attentions for Few-Shot Learning

1 code implementation15 Mar 2023 Jinxiang Lai, Siqian Yang, Wenlong Wu, Tao Wu, Guannan Jiang, Xi Wang, Jun Liu, Bin-Bin Gao, Wei zhang, Yuan Xie, Chengjie Wang

Then we derive two specific attention modules, named SpatialFormer Semantic Attention (SFSA) and SpatialFormer Target Attention (SFTA), to enhance the target object regions while reduce the background distraction.

Few-Shot Learning

INO at Factify 2: Structure Coherence based Multi-Modal Fact Verification

1 code implementation2 Mar 2023 Yinuo Zhang, Zhulin Tao, Xi Wang, Tongyue Wang

Therefore, we proposed a structure coherence-based multi-modal fact verification scheme to classify fake news.

Claim Verification Fact Verification +3

OPT: One-shot Pose-Controllable Talking Head Generation

no code implementations16 Feb 2023 Jin Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

To solve the identity mismatch problem and achieve high-quality free pose control, we present One-shot Pose-controllable Talking head generation network (OPT).

Disentanglement Talking Head Generation

Human Fall Detection- Multimodality Approach

no code implementations1 Feb 2023 Xi Wang, Ramya Penta, Bhavya Sehgal, Dale Chen-Song

Falls have become more frequent in recent years, which has been harmful for senior citizens. Therefore detecting falls have become important and several data sets and machine learning model have been introduced related to fall detection.

Binary Classification

Task2KB: A Public Task-Oriented Knowledge Base

no code implementations24 Jan 2023 Procheta Sen, Xi Wang, Ruiqing Xu, Emine Yilmaz

Search engines and conversational assistants are commonly used to help users complete their every day tasks such as booking travel, cooking, etc.

Knowledge Graphs

Structure-guided Image Outpainting

no code implementations21 Dec 2022 Xi Wang, Weixi Cheng, Wenliang Jia

we propose a deep learning method based on Generative Adversarial Network (GAN) and condition edges as structural prior in order to assist the generation.

Generative Adversarial Network Image Inpainting +1

Particle-based Variational Inference with Preconditioned Functional Gradient Flow

no code implementations25 Nov 2022 Hanze Dong, Xi Wang, Yong Lin, Tong Zhang

With the popularity of Stein variational gradient descent (SVGD), the focus of particle-based VI algorithms has been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow.

Variational Inference

Global Meets Local: Effective Multi-Label Image Classification via Category-Aware Weak Supervision

no code implementations23 Nov 2022 Jiawei Zhan, Jun Liu, Wei Tang, Guannan Jiang, Xi Wang, Bin-Bin Gao, Tianliang Zhang, Wenlong Wu, Wei zhang, Chengjie Wang, Yuan Xie

This paper builds a unified framework to perform effective noisy-proposal suppression and to interact between global and local features for robust feature learning.

Feature Correlation Multi-Label Image Classification

Rethinking the Metric in Few-shot Learning: From an Adaptive Multi-Distance Perspective

no code implementations2 Nov 2022 Jinxiang Lai, Siqian Yang, Guannan Jiang, Xi Wang, Yuxi Li, Zihui Jia, Xiaochen Chen, Jun Liu, Bin-Bin Gao, Wei zhang, Yuan Xie, Chengjie Wang

In this paper, for the first time, we investigate the contributions of different distance metrics, and propose an adaptive fusion scheme, bringing significant improvements in few-shot classification.

Few-Shot Learning

A survey on the development status and application prospects of knowledge graph in smart grids

no code implementations2 Nov 2022 Jian Wang, Xi Wang, Chaoqun Ma, Lei Kou

With the advent of the electric power big data era, semantic interoperability and interconnection of power data have received extensive attention.

Decision Making

Joint control variate for faster black-box variational inference

1 code implementation13 Oct 2022 Xi Wang, Tomas Geffner, Justin Domke

Black-box variational inference performance is sometimes hindered by the use of gradient estimators with high variance.

Stochastic Optimization Variational Inference

Causal Intervention for Fairness in Multi-behavior Recommendation

no code implementations10 Sep 2022 Xi Wang, Wenjie Wang, Fuli Feng, Wenge Rong, Chuantao Yin, Zhang Xiong

Specifically, we find that: 1) item popularity is a confounder between the exposed items and users' post-click interactions, leading to the first unfairness; and 2) some hidden confounders (e. g., the reputation of item producers) affect both item popularity and quality, resulting in the second unfairness.

Fairness Recommendation Systems

Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors

no code implementations6 Sep 2022 Xi Wang, Gen Li, Yen-Ling Kuo, Muhammed Kocabas, Emre Aksan, Otmar Hilliges

We further qualitatively evaluate the effectiveness of our method on real images and demonstrate its generalizability towards interaction types and object categories.

Human-Object Interaction Detection Object

Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources

1 code implementation14 Jul 2022 Ji Liu, daxiang dong, Xi Wang, An Qin, Xingjian Li, Patrick Valduriez, Dejing Dou, dianhai yu

Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time.

Knowledge Distillation

Rethinking Persistent Homology for Visual Recognition

no code implementations9 Jul 2022 Ekaterina Khramtsova, Guido Zuccon, Xi Wang, Mahsa Baktashmotlagh

This paper performs a detailed analysis of the effectiveness of topological properties for image classification in various training scenarios, defined by: the number of training samples, the complexity of the training data and the complexity of the backbone network.

Image Classification

Self-supervised Context-aware Style Representation for Expressive Speech Synthesis

no code implementations25 Jun 2022 Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie

In this paper, we propose a novel framework for learning style representation from abundant plain text in a self-supervised manner.

Contrastive Learning Deep Clustering +2

Robustness to corruption in pre-trained Bayesian neural networks

1 code implementation24 Jun 2022 Xi Wang, Laurence Aitchison

We develop ShiftMatch, a new training-data-dependent likelihood for robustness to corruption in Bayesian neural networks (BNNs).

Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines

no code implementations1 Jun 2022 Camilo Fosco, Emilie Josephs, Alex Andonian, Allen Lee, Xi Wang, Aude Oliva

Second, they allow us to generate novel "Deepfake Caricatures": transformations of the deepfake that exacerbate artifacts to improve human detection.

DeepFake Detection Face Swapping +2

PrEF: Percolation-based Evolutionary Framework for the diffusion-source-localization problem in large networks

no code implementations16 May 2022 Yang Liu, Xiaoqi Wang, Xi Wang, Zhen Wang, Jürgen Kurths

We assume that the state of a number of nodes in a network could be investigated if necessary, and study what configuration of those nodes could facilitate a better solution for the diffusion-source-localization (DSL) problem.

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

3 code implementations9 May 2022 Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Sentence Speech Synthesis +2

Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

1 code implementation CVPR 2022 Jiahao Xia, Weiwei qu, Wenjian Huang, JianGuo Zhang, Xi Wang, Min Xu

The SLPT generates the representation of each single landmark from a local patch and aggregates them by an adaptive inherent relation based on the attention mechanism.

Face Alignment Relation +1

EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection

1 code implementation21 Dec 2021 Zhe Liu, Tengteng Huang, Bingling Li, Xiwu Chen, Xi Wang, Xiang Bai

Recently, fusing the LiDAR point cloud and camera image to improve the performance and robustness of 3D object detection has received more and more attention, as these two modalities naturally possess strong complementarity.

3D Object Detection object-detection

No-regret Online Learning over Riemannian Manifolds

no code implementations NeurIPS 2021 Xi Wang, Zhipeng Tu, Yiguang Hong, Yingyi Wu, Guodong Shi

We consider online optimization over Riemannian manifolds, where a learner attempts to minimize a sequence of time-varying loss functions defined on Riemannian manifolds.

Monolithic Integrated Multiband Acoustic Devices on Heterogeneous Substrate for Sub-6 GHz RF-FEMs

no code implementations20 Oct 2021 Shibin Zhang, Hongyan Zhou, Pengcheng Zheng, Jinbo Wu, Liping Zhang, Zhongxu Li, Kai Huang, Xin Ou, Xi Wang

Monolithic integration of multiband (1. 4~ 6. 0 GHz) RF acoustic devices were successfully demonstrated within the same process flow by using the lithium niobate (LN) thin film on silicon carbide (LNOSiC) substrate.

LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition

no code implementations3 Sep 2021 Xinwei He, Silin Cheng, Dingkang Liang, Song Bai, Xi Wang, Yingying Zhu

To investigate this, we propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification.

3D Object Classification 3D Object Retrieval +3

PeCLR: Self-Supervised 3D Hand Pose Estimation from monocular RGB via Equivariant Contrastive Learning

1 code implementation ICCV 2021 Adrian Spurr, Aneesh Dahiya, Xi Wang, Xucong Zhang, Otmar Hilliges

Encouraged by the success of contrastive learning on image classification tasks, we propose a new self-supervised method for the structured regression task of 3D hand pose estimation.

3D Hand Pose Estimation Contrastive Learning +3

Speech BERT Embedding For Improving Prosody in Neural TTS

no code implementations8 Jun 2021 Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He

Experimental results obtained by the Transformer TTS show that the proposed BERT can extract fine-grained, segment-level prosody, which is complementary to utterance-level prosody to improve the final prosody of the TTS speech.

Decoder Text to Speech

QuadrupletBERT: An Efficient Model For Embedding-Based Large-Scale Retrieval

no code implementations NAACL 2021 Peiyang Liu, Sen Wang, Xi Wang, Wei Ye, Shikun Zhang

The embedding-based large-scale query-document retrieval problem is a hot topic in the information retrieval (IR) field.

Information Retrieval Retrieval

Turnover-Adjusted Information Ratio

no code implementations19 May 2021 Feng Zhang, Xi Wang, Honggao Cao

In this paper, we study the behavior of information ratio (IR) as determined by the fundamental law of active investment management.

Management

Bayesian OOD detection with aleatoric uncertainty and outlier exposure

no code implementations pproximateinference AABI Symposium 2022 Xi Wang, Laurence Aitchison

In particular, aleatoric uncertainty signals a specific type of OOD point: one without a well-defined class-label, and our model of data curation gives a likelihood for these points, giving us a mechanism for conditioning on outlier points and thus performing principled Bayesian outlier exposure.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Leveraging Review Properties for Effective Recommendation

no code implementations5 Feb 2021 Xi Wang, Iadh Ounis, Craig Macdonald

Furthermore, inspired by the users' information adoption framework, we integrate two loss functions and a negative sampling strategy into our proposed RPRM model, to ensure that the properties of reviews are correlated with the users' preferences.

Recommendation Systems

Learning Dual Priors for JPEG Compression Artifacts Removal

no code implementations ICCV 2021 Xueyang Fu, Xi Wang, Aiping Liu, Junwei Han, Zheng-Jun Zha

Specifically, we design a variational model to formulate the image de-blocking problem and propose two prior terms for the image content and gradient, respectively.

Blocking

CoFF: Cooperative Spatial Feature Fusion for 3D Object Detection on Autonomous Vehicles

no code implementations24 Sep 2020 Jingda Guo, Dominic Carrillo, Sihai Tang, Qi Chen, Qing Yang, Song Fu, Xi Wang, Nannan Wang, Paparao Palacharla

To reduce the amount of transmitted data, feature map based fusion is recently proposed as a practical solution to cooperative 3D object detection by autonomous vehicles.

3D Object Detection Autonomous Vehicles +2

Toward Quantifying Ambiguities in Artistic Images

no code implementations21 Aug 2020 Xi Wang, Zoya Bylinskii, Aaron Hertzmann, Robert Pepperell

It has long been hypothesized that perceptual ambiguities play an important role in aesthetic experience: a work with some ambiguity engages a viewer more than one that does not.

Negative Confidence-Aware Weakly Supervised Binary Classification for Effective Review Helpfulness Classification

no code implementations14 Aug 2020 Xi Wang, Iadh Ounis, Craig Macdonald

However, a classification model that learns to classify binary instances with incomplete positive labels while assuming all unlabelled data to be negative examples will often generate a biased classifier.

Binary Classification Classification +1

Deep Mining External Imperfect Data for Chest X-ray Disease Screening

no code implementations6 Jun 2020 Luyang Luo, Lequan Yu, Hao Chen, Quande Liu, Xi Wang, Jiaqi Xu, Pheng-Ann Heng

Recent researches have demonstrated that performance bottleneck exists in joint training on different CXR datasets, and few made efforts to address the obstacle.

General Classification Missing Labels +1

Unifying Structure Analysis and Surrogate-driven Function Regression for Glaucoma OCT Image Screening

no code implementations26 Jul 2019 Xi Wang, Hao Chen, Luyang Luo, An-ran Ran, Poemen P. Chan, Clement C. Tham, Carol Y. Cheung, Pheng-Ann Heng

Besides, the proposed multi-task learning network is capable of exploring the structure and function relationship from the OCT image and visual field measurement simultaneously, which contributes to classification performance boosting.

Multi-Task Learning regression

Forward-Backward Decoding for Regularizing End-to-End TTS

1 code implementation18 Jul 2019 Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jian-Hua Tao

Experimental results show our proposed methods especially the second one (bidirectional decoder regularization), leads a significantly improvement on both robustness and overall naturalness, as outperforming baseline (the revised version of Tacotron2) with a MOS gap of 0. 14 in a challenging test, and achieving close to human quality (4. 42 vs. 4. 49 in MOS) on general test.

Decoder

Deep Angular Embedding and Feature Correlation Attention for Breast MRI Cancer Analysis

no code implementations7 Jun 2019 Luyang Luo, Hao Chen, Xi Wang, Qi Dou, Huangjin Lin, Juan Zhou, Gongjie Li, Pheng-Ann Heng

In this paper, we propose to identify breast tumor in MRI by Cosine Margin Sigmoid Loss (CMSL) with deep learning (DL) and localize possible cancer lesion by COrrelation Attention Map (COAM) based on the learned features.

Feature Correlation Specificity

Center of circle after perspective transformation

no code implementations12 Feb 2019 Xi Wang, Albert Chern, Marc Alexa

The boundary of the pupil is fitted with an ellipse and the euclidean center of the ellipse in the image is taken as the center of the pupil.

Large-Scale Domain Adaptation via Teacher-Student Learning

no code implementations17 Aug 2017 Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong

High accuracy speech recognition requires a large amount of transcribed data for supervised training.

Domain Adaptation speech-recognition +1

Aggregating Frame-level Features for Large-Scale Video Classification

no code implementations4 Jul 2017 Shaoxiang Chen, Xi Wang, Yongyi Tang, Xinpeng Chen, Zuxuan Wu, Yu-Gang Jiang

This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset.

Classification General Classification +3

Evaluating Two-Stream CNN for Video Classification

no code implementations8 Apr 2015 Hao Ye, Zuxuan Wu, Rui-Wei Zhao, Xi Wang, Yu-Gang Jiang, xiangyang xue

In this paper, we conduct an in-depth study to investigate important implementation options that may affect the performance of deep nets on video classification.

Classification General Classification +2

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

1 code implementation7 Apr 2015 Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, xiangyang xue

In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.