Search Results for author: Xin Wang

Found 461 papers, 160 papers with code

Gorilla: Large Language Model Connected with Massive APIs

1 code implementation24 May 2023 Shishir G. Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez

Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis.

Hallucination Language Modelling +4

A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions

1 code implementation15 Jun 2022 Sheng Zhou, Hongjia Xu, Zhuonan Zheng, Jiawei Chen, Zhao Li, Jiajun Bu, Jia Wu, Xin Wang, Wenwu Zhu, Martin Ester

Motivated by the tremendous success of deep learning in clustering, one of the most fundamental machine learning tasks, and the large number of recent advances in this direction, in this paper we conduct a comprehensive survey on deep clustering by proposing a new taxonomy of different state-of-the-art approaches.

Clustering Deep Clustering +1

Automated Machine Learning on Graphs: A Survey

2 code implementations1 Mar 2021 Ziwei Zhang, Xin Wang, Wenwu Zhu

Machine learning on graphs has been extensively studied in both academic and industry.

BIG-bench Machine Learning Graph Learning +1

Frustratingly Simple Few-Shot Object Detection

5 code implementations ICML 2020 Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, Fisher Yu

Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods.

Few-Shot Object Detection Meta-Learning +2

HAHE: Hierarchical Attentive Heterogeneous Information Network Embedding

2 code implementations31 Jan 2019 Sheng Zhou, Jiajun Bu, Xin Wang, Jia-Wei Chen, Can Wang

Second, given a meta path, nodes in HIN are connected by path instances while existing works fail to fully explore the differences between path instances that reflect nodes' preferences in the semantic space.

Network Embedding

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

6 code implementations17 Apr 2019 Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.

Voice Conversion

Efficient Large Language Models: A Survey

3 code implementations6 Dec 2023 Zhongwei Wan, Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, Mosharaf Chowdhury, Mi Zhang

Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society.

Natural Language Understanding Text Generation

Variational quantum Gibbs state preparation with a truncated Taylor series

1 code implementation18 May 2020 Youle Wang, Guangxi Li, Xin Wang

By performing numerical experiments, we show that shallow parameterized circuits with only one additional qubit can be trained to prepare the Ising chain and spin chain Gibbs states with a fidelity higher than 95%.

Quantum Machine Learning

Variational Quantum Singular Value Decomposition

1 code implementation3 Jun 2020 Xin Wang, Zhixin Song, Youle Wang

In this work, we propose a variational quantum algorithm for singular value decomposition (VQSVD).

Image Compression Recommendation Systems

Variational Quantum Algorithms for Trace Distance and Fidelity Estimation

1 code implementation10 Dec 2020 Ranyiliu Chen, Zhixin Song, Xuanqiang Zhao, Xin Wang

A novel variational algorithm for trace distance estimation is then derived from this technique, with the assistance of a single ancillary qubit.

Quantum Physics Information Theory Mathematical Physics Information Theory Mathematical Physics Optimization and Control

Noise-Assisted Quantum Autoencoder

1 code implementation15 Dec 2020 Chenfeng Cao, Xin Wang

Based on this understanding, we present a noise-assisted quantum autoencoder algorithm to go beyond the limitations, our model can achieve high recovering fidelity for general input states.

Quantum Physics

VSQL: Variational Shadow Quantum Learning for Classification

1 code implementation15 Dec 2020 Guangxi Li, Zhixin Song, Xin Wang

Classification of quantum data is essential for quantum machine learning and near-term quantum technologies.

BIG-bench Machine Learning Classification +3

Detecting and quantifying entanglement on near-term quantum devices

1 code implementation28 Dec 2020 Kun Wang, Zhixin Song, Xuanqiang Zhao, Zihe Wang, Xin Wang

Firstly, it decomposes a positive map into a combination of quantum operations implementable on near-term quantum devices.

Quantum Physics Strongly Correlated Electrons

Practical distributed quantum information processing with LOCCNet

2 code implementations28 Jan 2021 Xuanqiang Zhao, Benchi Zhao, Zihe Wang, Zhixin Song, Xin Wang

Here we introduce LOCCNet, a machine learning framework facilitating protocol design and optimization for distributed quantum information processing tasks.

BIG-bench Machine Learning Quantum Machine Learning

Few-shot Object Detection via Feature Reweighting

4 code implementations ICCV 2019 Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell

The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples.

Few-Shot Learning Few-Shot Object Detection +3

Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation

1 code implementation ECCV 2018 Xin Wang, Wenhan Xiong, Hongmin Wang, William Yang Wang

In this paper, we take a radical approach to bridge the gap between synthetic studies and real-world practices---We propose a novel, planned-ahead hybrid reinforcement learning model that combines model-free and model-based reinforcement learning to solve a real-world vision-language navigation task.

Model-based Reinforcement Learning reinforcement-learning +4

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

3 code implementations CVPR 2020 Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, Trevor Darrell

Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving.

Autonomous Driving Domain Adaptation +8

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

1 code implementation CVPR 2022 Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

Recent self-supervised pretraining methods for object detection largely focus on pretraining the backbone of the object detector, neglecting key parts of detection architecture.

Few-Shot Learning Few-Shot Object Detection +6

Estimating the confidence of speech spoofing countermeasure

1 code implementation10 Oct 2021 Xin Wang, Junichi Yamagishi

On the ASVspoof2019 logical access database, the results demonstrate that an energy-based estimator and a neural-network-based one achieved acceptable performance in identifying unknown attacks in the test set.

A Practical Guide to Logical Access Voice Presentation Attack Detection

1 code implementation10 Jan 2022 Xin Wang, Junichi Yamagishi

Presentation attack detection (PAD) for ASV, or speech anti-spoofing, is therefore indispensable.

Artifact Detection Speaker Verification +2

Investigating Active-learning-based Training Data Selection for Speech Spoofing Countermeasure

1 code implementation28 Mar 2022 Xin Wang, Junich Yamagishi

This study took the initiative and investigated CM training using active learning (AL), a framework that iteratively selects useful data from a large pool set and fine-tunes the CM.

Active Learning Data Augmentation +1

Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders

1 code implementation19 Oct 2022 Xin Wang, Junichi Yamagishi

To make better use of pairs of bona fide and spoofed data, this study introduces a contrastive feature loss that can be plugged into the standard training criterion.

Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?

1 code implementation12 Sep 2023 Xin Wang, Junichi Yamagishi

While many datasets use spoofed data generated by speech synthesis systems, it was recently found that data vocoded by neural vocoders were also effective as the spoofed training data.

Self-Supervised Learning Speech Synthesis

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

3 code implementations23 Oct 2019 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.

Audio and Speech Processing

MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis

1 code implementation14 Nov 2023 Yitao Zhu, Zhenrong Shen, Zihao Zhao, Sheng Wang, Xin Wang, Xiangyu Zhao, Dinggang Shen, Qian Wang

By fixing the weight of ViT models and only adding small low-rank plug-ins, we achieve competitive results on various diagnosis tasks across different imaging modalities using only a few trainable parameters.

SkipNet: Learning Dynamic Routing in Convolutional Networks

2 code implementations ECCV 2018 Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, Joseph E. Gonzalez

While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient.

Decision Making

Graph Meets LLMs: Towards Large Graph Models

1 code implementation28 Aug 2023 Ziwei Zhang, Haoyang Li, Zeyang Zhang, Yijian Qin, Xin Wang, Wenwu Zhu

In order to promote applying large models for graphs forward, we present a perspective paper to discuss the challenges and opportunities associated with developing large graph models.

TOAST: Transfer Learning via Attention Steering

1 code implementation24 May 2023 Baifeng Shi, Siyu Gai, Trevor Darrell, Xin Wang

We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that keeps the pre-trained backbone frozen, selects task-relevant features in the output, and feeds those features back to the model to steer the attention to the task-specific features.

Fine-Grained Image Classification Instruction Following +2

A Clustering Framework for Unsupervised and Semi-supervised New Intent Discovery

1 code implementation16 Apr 2023 Hanlei Zhang, Hua Xu, Xin Wang, Fei Long, Kai Gao

New intent discovery is of great value to natural language processing, allowing for a better understanding of user needs and providing friendly services.

Clustering Intent Discovery +3

Artificial Intelligence Distinguishes COVID-19 from Community Acquired Pneumonia on Chest CT

1 code implementation Radiology 2020 Lin Li, Lixin Qin, Zeguo Xu, Youbing Yin, Xin Wang, Bin Kong, Junjie Bai, Yi Lu, Zhenghan Fang, Qi Song, Kunlin Cao, Daliang Liu, Guisheng Wang, Qizhong Xu, Xisheng Fang, Shiqin Zhang, Juan Xia, Jun Xia

Materials and Methods In this retrospective and multi-center study, a deep learning model, COVID-19 detection neural network (COVNet), was developed to extract visual features from volumetric chest CT exams for the detection of COVID-19.

COVID-19 Image Segmentation Specificity

ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan

1 code implementation1 Sep 2021 Héctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, Junichi Yamagishi

The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and the development of countermeasures.

Face Swapping Speaker Verification

Investigating self-supervised front ends for speech spoofing countermeasures

1 code implementation15 Nov 2021 Xin Wang, Junichi Yamagishi

Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks.

Face Swapping

Neural-Sim: Learning to Generate Training Data with NeRF

1 code implementation22 Jul 2022 Yunhao Ge, Harkirat Behl, Jiashu Xu, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, Vibhav Vineet

However, existing approaches either require human experts to manually tune each scene property or use automatic methods that provide little to no control; this requires rendering large amounts of random data variations, which is slow and is often suboptimal for the target domain.

Object Detection

Top-Down Visual Attention from Analysis by Synthesis

1 code implementation CVPR 2023 Baifeng Shi, Trevor Darrell, Xin Wang

In this paper, we consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.

Retrieval Semantic Segmentation +1

When Do We Not Need Larger Vision Models?

1 code implementation19 Mar 2024 Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell

Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S$^2$ can match or even exceed the advantage of larger models.

Depth Estimation

No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling

2 code implementations ACL 2018 Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang

Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem.

Image Captioning Visual Storytelling

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

1 code implementation29 Oct 2018 Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi

Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.

Speech Synthesis Text-To-Speech Synthesis

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

1 code implementation CVPR 2020 Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Referring Expression Vision and Language Navigation

Dichotomic Pattern Mining with Applications to Intent Prediction from Semi-Structured Clickstream Datasets

2 code implementations23 Jan 2022 Xin Wang, Serdar Kadioglu

We introduce a pattern mining framework that operates on semi-structured datasets and exploits the dichotomy between outcomes.

Active Learning Meets Optimized Item Selection

2 code implementations22 Nov 2021 Bernard Kleynhans, Xin Wang, Serdar Kadıoğlu

Designing recommendation systems with limited or no available training data remains a challenge.

Active Learning Clustering +2

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

1 code implementation ICCV 2021 Zeyu Hu, Xuyang Bai, Jiaxiang Shang, Runze Zhang, Jiayu Dong, Xin Wang, Guangyuan Sun, Hongbo Fu, Chiew-Lan Tai

Experimental results validate the effectiveness of VMNet: specifically, on the challenging ScanNet dataset for large-scale segmentation of indoor scenes, it outperforms the state-of-the-art SparseConvNet and MinkowskiNet (74. 6% vs 72. 5% and 73. 6% in mIoU) with a simpler network structure (17M vs 30M and 38M parameters).

3D Semantic Segmentation

SWIPENET: Object detection in noisy underwater images

1 code implementation19 Oct 2020 Long Chen, Feixiang Zhou, Shengke Wang, Junyu Dong, Ning li, Haiping Ma, Xin Wang, Huiyu Zhou

Moreover, inspired by the human education process that drives the learning from easy to hard concepts, we here propose the CMA training paradigm that first trains a clean detector which is free from the influence of noisy data.

Object object-detection +1

VTimeLLM: Empower LLM to Grasp Video Moments

1 code implementation30 Nov 2023 Bin Huang, Xin Wang, Hong Chen, Zihan Song, Wenwu Zhu

Large language models (LLMs) have shown remarkable text understanding capabilities, which have been extended as Video LLMs to handle video data for comprehending visual details.

Dense Video Captioning Video-based Generative Performance Benchmarking (Consistency) +5

Enhancing Video Super-Resolution via Implicit Resampling-based Alignment

1 code implementation arXiv 2024 Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, Angela Yao

We show that bilinear interpolation inherently attenuates high-frequency information while an MLP-based coordinate network can approximate more frequencies.

Video Super-Resolution

Dynamic Multi-scale Convolution for Dialect Identification

1 code implementation2 Aug 2021 Tianlong Kong, Shouyi Yin, Dawei Zhang, Wang Geng, Xin Wang, Dandan song, Jinwen Huang, Huiyu Shi, Xiaorui Wang

To address this issue, we propose a new architecture, named dynamic multi-scale convolution, which consists of dynamic kernel convolution, local multi-scale learning, and global multi-scale pooling.

Dialect Identification

Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network

1 code implementation26 Jul 2019 Xin Wang, Bo Wu, Yun Ye, Yueqi Zhong

Existing works about fashion outfit compatibility focus on predicting the overall compatibility of a set of fashion items with their information from different modalities.

Fashion Compatibility Learning Persuasiveness

Robust Contrastive Learning against Noisy Views

1 code implementation CVPR 2022 Ching-Yao Chuang, R Devon Hjelm, Xin Wang, Vibhav Vineet, Neel Joshi, Antonio Torralba, Stefanie Jegelka, Yale Song

Contrastive learning relies on an assumption that positive pairs contain related views, e. g., patches of an image or co-occurring multimodal signals of a video, that share certain underlying information about an instance.

Binary Classification Contrastive Learning

Robust Object Detection via Instance-Level Temporal Cycle Confusion

1 code implementation ICCV 2021 Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell

Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications.

Object object-detection +2

The VoicePrivacy 2022 Challenge Evaluation Plan

1 code implementation23 Mar 2022 Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Hubert Nourtel, Pierre Champion, Massimiliano Todisco, Emmanuel Vincent, Nicholas Evans, Junichi Yamagishi, Jean-François Bonastre

Participants apply their developed anonymization systems, run evaluation scripts and submit objective evaluation results and anonymized speech data to the organizers.

Speaker Verification

Introducing the VoicePrivacy Initiative

3 code implementations4 May 2020 Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

The VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.

Benchmarking

The VoicePrivacy 2020 Challenge Evaluation Plan

1 code implementation14 May 2022 Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.

Benchmarking

Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video

1 code implementation CVPR 2019 Samvit Jain, Xin Wang, Joseph Gonzalez

We present Accel, a novel semantic video segmentation system that achieves high accuracy at low inference cost by combining the predictions of two network branches: (1) a reference branch that extracts high-detail features on a reference keyframe, and warps these features forward using frame-to-frame optical flow estimates, and (2) an update branch that computes features of adjustable quality on the current frame, performing a temporal update at each video frame.

Optical Flow Estimation Segmentation +3

Speech waveform synthesis from MFCC sequences with generative adversarial networks

1 code implementation3 Apr 2018 Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis.

Generative Adversarial Network Speech Synthesis

TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning

1 code implementation CVPR 2019 Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez

We show that TAFE-Net is highly effective in generalizing to new tasks or concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and few-shot learning.

Attribute Few-Shot Learning +1

Task-Aware Feature Generation for Zero-Shot Compositional Learning

1 code implementation11 Jun 2019 Xin Wang, Fisher Yu, Trevor Darrell, Joseph E. Gonzalez

In this work, we propose a task-aware feature generation (TFG) framework for compositional learning, which generates features of novel visual concepts by transferring knowledge from previously seen concepts.

Novel Concepts Zero-Shot Learning

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

2 code implementations ICCV 2019 Xin Wang, Jiawei Wu, Junkun Chen, Lei LI, Yuan-Fang Wang, William Yang Wang

We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.

Machine Translation Translation +3

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

1 code implementation4 Apr 2021 Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi

Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities.

Speaker Verification

MetaDelta: A Meta-Learning System for Few-shot Image Classification

1 code implementation22 Feb 2021 Yudong Chen, Chaoyu Guan, Zhikun Wei, Xin Wang, Wenwu Zhu

Meta-learning aims at learning quickly on novel tasks with limited data by transferring generic experience learned from previous tasks.

Classification Few-Shot Image Classification +2

Stochastic Actor-Executor-Critic for Image-to-Image Translation

1 code implementation14 Dec 2021 Ziwei Luo, Jing Hu, Xin Wang, Siwei Lyu, Bin Kong, Youbing Yin, Qi Song, Xi Wu

Training a model-free deep reinforcement learning model to solve image-to-image translation is difficult since it involves high-dimensional continuous state and action spaces.

Continuous Control Image-to-Image Translation +3

GPPT: Graph Pre-training and Prompt Tuning to Generalize Graph Neural Networks

1 code implementation SIGKDD 2022 Mingchen Sun, Kaixiong Zhou, Xin He, Ying Wang, Xin Wang

Based on the pre-trained model, we propose the graph prompting function to modify the standalone node into a token pair, and reformulate the downstream node classification looking the same as edge prediction.

Few-Shot Learning Node Classification +3

Image-to-Image Translation with Deep Reinforcement Learning

1 code implementation24 Sep 2023 Xin Wang, Ziwei Luo, Jing Hu, Chengming Feng, Shu Hu, Bin Zhu, Xi Wu, Xin Li, Siwei Lyu

The key feature in the RL-I2IT framework is to decompose a monolithic learning process into small steps with a lightweight model to progressively transform a source image successively to a target image.

Auxiliary Learning Decision Making +3

STFT spectral loss for training a neural speech waveform model

1 code implementation29 Oct 2018 Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi

This paper proposes a new loss using short-time Fourier transform (STFT) spectra for the aim of training a high-performance neural speech waveform model that predicts raw continuous speech waveform samples directly.

Distilling Holistic Knowledge with Graph Neural Networks

1 code implementation ICCV 2021 Sheng Zhou, Yucheng Wang, Defang Chen, Jiawei Chen, Xin Wang, Can Wang, Jiajun Bu

The holistic knowledge is represented as a unified graph-based embedding by aggregating individual knowledge from relational neighborhood samples with graph neural networks, the student network is learned by distilling the holistic knowledge in a contrastive manner.

Knowledge Distillation

Self-Supervised Learning for Contextualized Extractive Summarization

2 code implementations ACL 2019 Hong Wang, Xin Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, William Yang Wang

Existing models for extractive summarization are usually trained from scratch with a cross-entropy loss, which does not explicitly capture the global context at the document level.

Extractive Summarization Self-Supervised Learning

XL-NBT: A Cross-lingual Neural Belief Tracking Framework

1 code implementation EMNLP 2018 Wenhu Chen, Jianshu Chen, Yu Su, Xin Wang, Dong Yu, Xifeng Yan, William Yang Wang

Then, we pre-train a state tracker for the source language as a teacher, which is able to exploit easy-to-access parallel data.

Transfer Learning

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

1 code implementation10 Nov 2019 Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi

Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones.

Sound Audio and Speech Processing

Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer

2 code implementations CVPR 2017 Xin Wang, Geoffrey Oxholm, Da Zhang, Yuan-Fang Wang

That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.

Style Transfer

Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction

1 code implementation25 Nov 2020 Anzhu Yu, Wenyue Guo, Bing Liu, Xin Chen, Xin Wang, Xuefeng Cao, Bingchuan Jiang

This strategy estimates the depth map at coarsest level, while the depth maps at finer levels are considered as the upsampled depth map from previous level with pixel-wise depth residual.

3D Reconstruction

Visual Attention Emerges from Recurrent Sparse Reconstruction

1 code implementation23 Apr 2022 Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang

We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity.

LiDAL: Inter-frame Uncertainty Based Active Learning for 3D LiDAR Semantic Segmentation

1 code implementation11 Nov 2022 Zeyu Hu, Xuyang Bai, Runze Zhang, Xin Wang, Guangyuan Sun, Hongbo Fu, Chiew-Lan Tai

We propose LiDAL, a novel active learning method for 3D LiDAR semantic segmentation by exploiting inter-frame uncertainty among LiDAR frames.

Active Learning LIDAR Semantic Segmentation +1

Detecting Multimedia Generated by Large AI Models: A Survey

1 code implementation22 Jan 2024 Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu

The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life.

Disentangled Self-Supervision in Sequential Recommenders

1 code implementation23 Aug 2020 Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, Wenwu Zhu

There exist two challenges: i) reconstructing a future sequence containing many behaviors is exponentially harder than reconstructing a single next behavior, which can lead to difficulty in convergence, and ii) the sequence of all future behaviors can involve many intentions, not all of which may be predictable from the sequence of earlier behaviors.

Disentanglement

A Unified Approach to Interpreting and Boosting Adversarial Transferability

1 code implementation8 Oct 2020 Xin Wang, Jie Ren, Shuyun Lin, Xiangming Zhu, Yisen Wang, Quanshi Zhang

We discover and prove the negative correlation between the adversarial transferability and the interaction inside adversarial perturbations.

Doubly Right Object Recognition: A Why Prompt for Visual Rationales

1 code implementation CVPR 2023 Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng Yang, Xin Wang, Carl Vondrick

We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales.

Object Recognition

DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation

1 code implementation5 May 2023 Hong Chen, YiPeng Zhang, Simin Wu, Xin Wang, Xuguang Duan, Yuwei Zhou, Wenwu Zhu

To tackle the problems, we propose DisenBooth, an identity-preserving disentangled tuning framework for subject-driven text-to-image generation.

Denoising Disentanglement +1

NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search

1 code implementation18 Jun 2022 Yijian Qin, Ziwei Zhang, Xin Wang, Zeyang Zhang, Wenwu Zhu

To the best of our knowledge, our work is the first benchmark for graph neural architecture search.

Benchmarking Neural Architecture Search

Instance-Aware Predictive Navigation in Multi-Agent Environments

1 code implementation14 Jan 2021 Jinkun Cao, Xin Wang, Trevor Darrell, Fisher Yu

To decide the action at each step, we seek the action sequence that can lead to safe future states based on the prediction module outputs by repeatedly sampling likely action sequences.

Curriculum Disentangled Recommendation with Noisy Multi-feedback

1 code implementation NeurIPS 2021 Hong Chen, Yudong Chen, Xin Wang, Ruobing Xie, Rui Wang, Feng Xia, Wenwu Zhu

However, learning such disentangled representations from multi-feedback data is challenging because i) multi-feedback is complex: there exist complex relations among different types of feedback (e. g., click, unclick, and dislike, etc) as well as various user intentions, and ii) multi-feedback is noisy: there exists noisy (useless) information both in features and labels, which may deteriorate the recommendation performance.

Denoising Representation Learning

Range-Based Equal Error Rate for Spoof Localization

1 code implementation28 May 2023 Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

To properly measure misclassified ranges and better evaluate spoof localization performance, we upgrade point-based EER to range-based EER.

A Unified Game-Theoretic Interpretation of Adversarial Robustness

1 code implementation12 Mar 2021 Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, Quanshi Zhang

This paper provides a unified view to explain different adversarial attacks and defense methods, i. e. the view of multi-order interactions between input variables of DNNs.

Adversarial Robustness

A Unified Game-Theoretic Interpretation of Adversarial Robustness

1 code implementation5 Nov 2021 Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, Quanshi Zhang

This paper provides a unified view to explain different adversarial attacks and defense methods, \emph{i. e.} the view of multi-order interactions between input variables of DNNs.

Adversarial Robustness

Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness

1 code implementation NeurIPS 2021 Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, Quanshi Zhang

This paper provides a unified view to explain different adversarial attacks and defense methods, i. e. the view of multi-order interactions between input variables of DNNs.

Adversarial Robustness

Learning to Solve Travelling Salesman Problem with Hardness-adaptive Curriculum

1 code implementation7 Apr 2022 Zeyang Zhang, Ziwei Zhang, Xin Wang, Wenwu Zhu

To solve these challenges, we first propose a principled hardness measurement to quantify the hardness of TSP instances.

Combinatorial Optimization

When does reinforcement learning stand out in quantum control? A comparative study on state preparation

2 code implementations6 Feb 2019 Xiao-Ming Zhang, Zezhu Wei, Raza Asad, Xu-Chen Yang, Xin Wang

In this work, we perform a comparative study on the efficacy of three reinforcement learning algorithms: tabular Q-learning, deep Q-learning, and policy gradient, as well as two non-machine-learning methods: stochastic gradient descent and Krotov algorithms, in the problem of preparing a desired quantum state.

Quantum Physics

Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment

1 code implementation22 Aug 2020 Geeticka Chauhan, Ruizhi Liao, William Wells, Jacob Andreas, Xin Wang, Seth Berkowitz, Steven Horng, Peter Szolovits, Polina Golland

To take advantage of the rich information present in the radiology reports, we develop a neural network model that is trained on both images and free-text to assess pulmonary edema severity from chest radiographs at inference time.

Image Classification Representation Learning

Synthesis-based Imaging-Differentiation Representation Learning for Multi-Sequence 3D/4D MRI

1 code implementation1 Feb 2023 Luyi Han, Tao Tan, Tianyu Zhang, Yunzhi Huang, Xin Wang, Yuan Gao, Jonas Teuwen, Ritse Mann

Multi-sequence MRIs can be necessary for reliable diagnosis in clinical practice due to the complimentary information within sequences.

Representation Learning

An Explainable Deep Framework: Towards Task-Specific Fusion for Multi-to-One MRI Synthesis

1 code implementation3 Jul 2023 Luyi Han, Tianyu Zhang, Yunzhi Huang, Haoran Dou, Xin Wang, Yuan Gao, Chunyao Lu, Tan Tao, Ritse Mann

Multi-sequence MRI is valuable in clinical settings for reliable diagnosis and treatment prognosis, but some sequences may be unusable or missing for various reasons.

S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks

1 code implementation21 Jul 2018 Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang

In this paper, we present a novel Single Shot multi-Span Detector for temporal activity detection in long, untrimmed videos using a simple end-to-end fully three-dimensional convolutional (Conv3D) network.

Action Detection Activity Detection

Medical Matting: A New Perspective on Medical Segmentation with Uncertainty

1 code implementation18 Jun 2021 Lin Wang, Lie Ju, Xin Wang, Wanji He, Donghao Zhang, Yelin Huang, Zhiwen Yang, Xuan Yao, Xin Zhao, Xiufen Ye, ZongYuan Ge

None of them investigate the influence of the ambiguous nature of the lesion itself. Inspired by image matting, this paper introduces alpha matte as a soft mask to represent uncertain areas in medical scenes and accordingly puts forward a new uncertainty quantification method to fill the gap of uncertainty research for lesion structure.

Image Matting Image Segmentation +3

A region-growing approach for automatic outcrop fracture extraction from a three-dimensional point cloud

1 code implementation27 Jun 2017 Xin Wang, Lejun Zou, Xiaohua Shen, Yupeng Ren, Yi Qin

In tests using outcrop point cloud data, the proposed method identified and extracted the full extent of individual fractures with high accuracy.

Wanderlust: Online Continual Object Detection in the Real World

1 code implementation ICCV 2021 Jianren Wang, Xin Wang, Yue Shang-Guan, Abhinav Gupta

To bridge the gap, we present a new online continual object detection benchmark with an egocentric video dataset, Objects Around Krishna (OAK).

Continual Learning Object +2

Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline

1 code implementation29 Nov 2022 Paul-Gauthier Noé, Xiaoxiao Miao, Xin Wang, Junichi Yamagishi, Jean-François Bonastre, Driss Matrouf

The use of modern vocoders in an analysis/synthesis pipeline allows us to investigate high-quality voice conversion that can be used for privacy purposes.

Voice Conversion

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

1 code implementation12 Mar 2024 Xin Wang, Yu Zheng, Zhongwei Wan, Mi Zhang

However, state-of-the-art SVD-based LLM compression methods have two key limitations: truncating smaller singular values may lead to higher compression loss, and the lack of update on the remaining model parameters after SVD truncation.

Language Modelling Large Language Model +1

BenchTemp: A General Benchmark for Evaluating Temporal Graph Neural Networks

1 code implementation31 Aug 2023 Qiang Huang, Jiawei Jiang, Xi Susie Rao, Ce Zhang, Zhichao Han, Zitao Zhang, Xin Wang, Yongjun He, Quanqing Xu, Yang Zhao, Chuang Hu, Shuo Shang, Bo Du

To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed.

Link Prediction Node Classification

Hierarchical Interaction Networks with Rethinking Mechanism for Document-level Sentiment Analysis

1 code implementation16 Jul 2020 Lingwei Wei, Dou Hu, Wei Zhou, Xuehai Tang, Xiaodan Zhang, Xin Wang, Jizhong Han, Songlin Hu

Furthermore, we design a Sentiment-based Rethinking mechanism (SR) by refining the HIN with sentiment label information to learn a more sentiment-aware document representation.

Sentiment Analysis Sentiment Classification +1

Learning by Minimizing the Sum of Ranked Range

1 code implementation NeurIPS 2020 Shu Hu, Yiming Ying, Xin Wang, Siwei Lyu

In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output.

Binary Classification General Classification +2

Sum of Ranked Range Loss for Supervised Learning

1 code implementation7 Jun 2021 Shu Hu, Yiming Ying, Xin Wang, Siwei Lyu

A combination loss of AoRR and TKML is proposed as a new learning objective for improving the robustness of multi-label learning in the face of outliers in sample and labels alike.

Multi-class Classification Multi-Label Learning

Orthogonal Graph Neural Networks

1 code implementation23 Sep 2021 Kai Guo, Kaixiong Zhou, Xia Hu, Yu Li, Yi Chang, Xin Wang

Graph neural networks (GNNs) have received tremendous attention due to their superiority in learning node representations.

Attribute Graph Classification

Not All Low-Pass Filters are Robust in Graph Convolutional Networks

1 code implementation NeurIPS 2021 Heng Chang, Yu Rong, Tingyang Xu, Yatao Bian, Shiji Zhou, Xin Wang, Junzhou Huang, Wenwu Zhu

Graph Convolutional Networks (GCNs) are promising deep learning approaches in learning representations for graph-structured data.

Scaling Novel Object Detection with Weakly Supervised Detection Transformers

1 code implementation11 Jul 2022 Tyler LaBonte, Yale Song, Xin Wang, Vibhav Vineet, Neel Joshi

A critical object detection task is finetuning an existing model to detect novel objects, but the standard workflow requires bounding box annotations which are time-consuming and expensive to collect.

Multiple Instance Learning Novel Object Detection +4

Unlocking the Potential of Deep Learning in Peak-Hour Series Forecasting

1 code implementation4 Jul 2023 Zhenwei Zhang, Xin Wang, Jingyuan Xie, Heling Zhang, Yuantao Gu

Unlocking the potential of deep learning in Peak-Hour Series Forecasting (PHSF) remains a critical yet underexplored task in various domains.

Time Series Time Series Forecasting

MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations

1 code implementation16 Mar 2024 Hanlei Zhang, Xin Wang, Hua Xu, Qianrui Zhou, Kai Gao, Jianhua Su, jinyue Zhao, Wenrui Li, Yanting Chen

We believe that MIntRec2. 0 will serve as a valuable resource, providing a pioneering foundation for research in human-machine conversational interactions, and significantly facilitating related applications.

Multimodal Intent Recognition

Out-Of-Distribution Generalization on Graphs: A Survey

1 code implementation16 Feb 2022 Haoyang Li, Xin Wang, Ziwei Zhang, Wenwu Zhu

This paper is the first systematic and comprehensive review of OOD generalization on graphs, to the best of our knowledge.

Out-of-Distribution Generalization

Adversarial Prompt Tuning for Vision-Language Models

1 code implementation19 Nov 2023 Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, Jitao Sang

With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities.

Adversarial Robustness

PokeMQA: Programmable knowledge editing for Multi-hop Question Answering

1 code implementation23 Dec 2023 Hengrui Gu, Kaixiong Zhou, Xiaotian Han, Ninghao Liu, Ruobing Wang, Xin Wang

Multi-hop question answering (MQA) is one of the challenging tasks to evaluate machine's comprehension and reasoning abilities, where large language models (LLMs) have widely achieved the human-comparable performance.

Answer Generation knowledge editing +3

Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective

1 code implementation5 Feb 2024 Zexin Wang, Changhua Pei, Minghua Ma, Xin Wang, Zhihan Li, Dan Pei, Saravan Rajmohan, Dongmei Zhang, QIngwei Lin, Haiming Zhang, Jianhui Li, Gaogang Xie

To ensure an accurate AD, FCVAE exploits an innovative approach to concurrently integrate both the global and local frequency features into the condition of Conditional Variational Autoencoder (CVAE) to significantly increase the accuracy of reconstructing the normal data.

Anomaly Detection Time Series +1

T$_k$ML-AP: Adversarial Attacks to Top-$k$ Multi-Label Learning

1 code implementation31 Jul 2021 Shu Hu, Lipeng Ke, Xin Wang, Siwei Lyu

Top-$k$ multi-label learning, which returns the top-$k$ predicted labels from an input, has many practical applications such as image annotation, document analysis, and web search engine.

Multi-Label Learning

TkML-AP: Adversarial Attacks to Top-k Multi-Label Learning

1 code implementation ICCV 2021 Shu Hu, Lipeng Ke, Xin Wang, Siwei Lyu

Top-k multi-label learning, which returns the top-k predicted labels from an input, has many practical applications such as image annotation, document analysis, and web search engine.

Multi-Label Learning

Integrating Listwise Ranking into Pairwise-based Image-Text Retrieval

1 code implementation26 May 2023 Zheng Li, Caili Guo, Xin Wang, Zerun Feng, Yanjun Wang

Given a query caption, the goal is to rank candidate images by relevance, from large to small.

Retrieval Text Retrieval

Compositional Coding for Collaborative Filtering

1 code implementation9 May 2019 Chenghao Liu, Tao Lu, Xin Wang, Zhiyong Cheng, Jianling Sun, Steven C. H. Hoi

However, CF with binary codes naturally suffers from low accuracy due to limited representation capability in each bit, which impedes it from modeling complex structure of the data.

Collaborative Filtering Recommendation Systems

Reproducible and Portable Big Data Analytics in the Cloud

1 code implementation17 Dec 2021 Xin Wang, Pei Guo, Xingyan Li, Aryya Gangopadhyay, Carl E. Busart, Jade Freeman, Jianwu Wang

To tackle these problems, we leverage serverless computing and containerization techniques for automated scalable execution and reproducibility, and utilize the adapter design pattern to enable application portability and reproducibility across different clouds.

Cloud Computing Descriptive

Quantum Self-Attention Neural Networks for Text Classification

1 code implementation11 May 2022 Guangxi Li, Xuanqiang Zhao, Xin Wang

An emerging direction of quantum computing is to establish meaningful quantum applications in various fields of artificial intelligence, including natural language processing (NLP).

text-classification Text Classification

Rethinking Propagation for Unsupervised Graph Domain Adaptation

1 code implementation8 Feb 2024 Meihan Liu, Zeyu Fang, Zhen Zhang, Ming Gu, Sheng Zhou, Xin Wang, Jiajun Bu

Motivated by our empirical analysis, we reevaluate the role of GNNs in graph domain adaptation and uncover the pivotal role of the propagation process in GNNs for adapting to different graph domains.

Domain Adaptation

A Multi-Level Attention Model for Evidence-Based Fact Checking

1 code implementation Findings (ACL) 2021 Canasai Kruengkrai, Junichi Yamagishi, Xin Wang

Evidence-based fact checking aims to verify the truthfulness of a claim against evidence extracted from textual sources.

Fact Checking Sentence

Generalizable control for quantum parameter estimation through reinforcement learning

1 code implementation25 Apr 2019 Han Xu, Junning Li, Liqiang Liu, Yu Wang, Haidong Yuan, Xin Wang

Measurement and estimation of parameters are essential for science and engineering, where one of the main quests is to find systematic schemes that can achieve high precision.

Quantum Physics Mesoscale and Nanoscale Physics

Multimodal Gait Recognition for Neurodegenerative Diseases

1 code implementation7 Jan 2021 Aite Zhao, Jianbo Li, Junyu Dong, Lin Qi, Qianni Zhang, Ning li, Xin Wang, Huiyu Zhou

In recent years, single modality based gait recognition has been extensively explored in the analysis of medical images or other sensory data, and it is recognised that each of the established approaches has different strengths and weaknesses.

Gait Recognition

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing

1 code implementation11 Jun 2021 Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee

Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity.

Speaker Verification Voice Anti-spoofing

IMPORTANT-Net: Integrated MRI Multi-Parameter Reinforcement Fusion Generator with Attention Network for Synthesizing Absent Data

1 code implementation3 Feb 2023 Tianyu Zhang, Tao Tan, Luyi Han, Xin Wang, Yuan Gao, Jonas Teuwen, Regina Beets-Tan, Ritse Mann

Then the multi-parameter fusion with attention module enables the interaction of the encoded information from different parameters through a set of algorithmic strategies, and applies different weights to the information through the attention mechanism after information fusion to obtain refined representation information.

Lesion Classification Lesion Detection

HDG-ODE: A Hierarchical Continuous-Time Model for Human Pose Forecasting

1 code implementation ICCV 2023 Yucheng Xing, Xin Wang

Considering the structural-property of the skeleton data in representing human poses and the possible irregularity caused by occlusion, we propose the use of dynamic graph convolution as the basic operator.

Human Pose Forecasting

Spectral Invariant Learning for Dynamic Graphs under Distribution Shifts

1 code implementation NeurIPS 2023 Zeyang Zhang, Xin Wang, Ziwei Zhang, Zhou Qin, Weigao Wen, Hui Xue, Haoyang Li, Wenwu Zhu

In this paper, we discover that there exist cases with distribution shifts unobservable in the time domain while observable in the spectral domain, and propose to study distribution shifts on dynamic graphs in the spectral domain for the first time.

Link Prediction Node Classification

Multi-Scale Context Aggregation Network with Attention-Guided for Crowd Counting

1 code implementation6 Apr 2021 Xin Wang, Yang Zhao, Tangwen Yang, Qiuqi Ruan

In this paper, we propose a multi-scale context aggregation network (MSCANet) based on single-column encoder-decoder architecture for crowd counting, which consists of an encoder based on a dense context-aware module (DCAM) and a hierarchical attention-guided decoder.

Crowd Counting

Unsupervised Domain Adaptive Fundus Image Segmentation with Category-level Regularization

1 code implementation8 Jul 2022 Wei Feng, Lin Wang, Lie Ju, Xin Zhao, Xin Wang, Xiaoyu Shi, ZongYuan Ge

Existing unsupervised domain adaptation methods based on adversarial learning have achieved good performance in several medical imaging tasks.

Image Segmentation Semantic Segmentation +1

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

1 code implementation22 Jul 2022 Fenia Christopoulou, Gerasimos Lampouras, Milan Gritta, Guchun Zhang, Yinpeng Guo, Zhongqi Li, Qi Zhang, Meng Xiao, Bo Shen, Lin Li, Hao Yu, Li Yan, Pingyi Zhou, Xin Wang, Yuchi Ma, Ignacio Iacobacci, Yasheng Wang, Guangtai Liang, Jiansheng Wei, Xin Jiang, Qianxiang Wang, Qun Liu

We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i. e. the synthesis of programming language solutions given a natural language problem description.

Code Generation Language Modelling +2

Outlier Robust Adversarial Training

1 code implementation10 Sep 2023 Shu Hu, Zhenhuan Yang, Xin Wang, Yiming Ying, Siwei Lyu

Theoretically, we show that the learning objective of ORAT satisfies the $\mathcal{H}$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss.

Adversarial Attack Binary Classification

On-the-Fly SfM: What you capture is What you get

1 code implementation21 Sep 2023 Zongqian Zhan, Rui Xia, Yifei Yu, Yibo Xu, Xin Wang

Over the last decades, ample achievements have been made on Structure from motion (SfM).

Image Registration Image Retrieval +1

Deep Learning to Quantify Pulmonary Edema in Chest Radiographs

1 code implementation13 Aug 2020 Steven Horng, Ruizhi Liao, Xin Wang, Sandeep Dalal, Polina Golland, Seth J. Berkowitz

Results: The area under the receiver operating characteristic curve (AUC) for differentiating alveolar edema from no edema was 0. 99 for the semi-supervised model and 0. 87 for the pre-trained models.

Efficiently Solve the Max-cut Problem via a Quantum Qubit Rotation Algorithm

1 code implementation15 Oct 2021 Xin Wang

The quantum circuits are comprised with single-qubit rotation gates implementing on each qubit.

Combinatorial Optimization

Unsupervised Multiplex Graph Learning with Complementary and Consistent Information

1 code implementation3 Aug 2023 Liang Peng, Xin Wang, Xiaofeng Zhu

Unsupervised multiplex graph learning (UMGL) has been shown to achieve significant effectiveness for different downstream tasks by exploring both complementary information and consistent information among multiple graphs.

Graph Learning Representation Learning

Robust COVID-19 Detection in CT Images with CLIP

1 code implementation13 Mar 2024 Li Lin, Yamini Sri Krubha, Zhenhuan Yang, Cheng Ren, Thuc Duy Le, Irene Amerini, Xin Wang, Shu Hu

In the realm of medical imaging, particularly for COVID-19 detection, deep learning models face substantial challenges such as the necessity for extensive computational resources, the paucity of well-annotated datasets, and a significant amount of unlabeled data.

Robust Light-Weight Facial Affective Behavior Recognition with CLIP

1 code implementation14 Mar 2024 Li Lin, Sarah Papabathini, Xin Wang, Shu Hu

Human affective behavior analysis aims to delve into human expressions and behaviors to deepen our understanding of human emotions.

Deep Mixture of Experts via Shallow Embedding

no code implementations5 Jun 2018 Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell, Joseph E. Gonzalez

Larger networks generally have greater representational power at the cost of increased computational complexity.

Few-Shot Learning Zero-Shot Learning

Fast Weight Long Short-Term Memory

no code implementations18 Apr 2018 T. Anderson Keller, Sharath Nittur Sridhar, Xin Wang

Associative memory using fast weights is a short-term memory mechanism that substantially improves the memory capacity and time scale of recurrent neural networks (RNNs).

Retrieval

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

no code implementations7 Apr 2018 Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches.

Speech Synthesis

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

no code implementations2 Mar 2018 Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi, Tomi Kinnunen

Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database.

Generative Adversarial Network Speech Enhancement +2

Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks

no code implementations NeurIPS 2017 Urs Köster, Tristan J. Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William H. Constable, Oğuz H. Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, Naveen Rao

Here we present the Flexpoint data format, aiming at a complete replacement of 32-bit floating point format training and inference, designed to support modern deep network topologies without modifications.

Generative Adversarial Network

IDK Cascades: Fast Deep Learning by Learning not to Overthink

no code implementations3 Jun 2017 Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Fisher Yu, Joseph E. Gonzalez

Advances in deep learning have led to substantial increases in prediction accuracy but have been accompanied by increases in the cost of rendering predictions.

Dialogue Generation

Deep Reinforcement Learning for Visual Object Tracking in Videos

no code implementations31 Jan 2017 Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang

In this paper we introduce a fully end-to-end approach for visual tracking in videos that learns to predict the bounding box locations of a target object at every frame.

Decision Making Object +4

Stochastic Averaging for Constrained Optimization with Application to Online Resource Allocation

no code implementations7 Oct 2016 Tianyi Chen, Aryan Mokhtari, Xin Wang, Alejandro Ribeiro, Georgios B. Giannakis

Existing approaches to resource allocation for nowadays stochastic networks are challenged to meet fast convergence and tolerable delay requirements.

Classification of Neurological Gait Disorders Using Multi-task Feature Learning

no code implementations8 Dec 2016 Ioannis Papavasileiou, Wenlong Zhang, Xin Wang, Jinbo Bi, Li Zhang, Song Han

An advanced machine learning method, multi-task feature learning (MTFL), is used to jointly train classification models of a subject's gait in three classes, post-stroke, PD and healthy gait.

Classification General Classification

Robust Learning with Kernel Mean p-Power Error Loss

no code implementations21 Dec 2016 Badong Chen, Lei Xing, Xin Wang, Jing Qin, Nanning Zheng

Correntropy is a second order statistical measure in kernel space, which has been successfully applied in robust learning and signal processing.

On Multiplicative Multitask Feature Learning

no code implementations NeurIPS 2014 Xin Wang, Jinbo Bi, Shipeng Yu, Jiangwen Sun

We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers.

A multi-task learning model for malware classification with useful file access pattern from API call sequence

no code implementations19 Oct 2016 Xin Wang, Siu Ming Yiu

Based on API call sequences, semantic-aware and machine learning (ML) based malware classifiers can be built for malware detection or classification.

Classification Document Classification +6

Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

no code implementations2 Aug 2018 Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa

We investigated the impact of noisy linguistic features on the performance of a Japanese speech synthesis system based on neural network that uses WaveNet vocoder.

Denoising Speech Synthesis

Non-asymptotic entanglement distillation

1 code implementation19 Jun 2017 Kun Fang, Xin Wang, Marco Tomamichel, Runyao Duan

For isotropic states, it can be further simplified to a linear program.

Quantum Physics

Neural source-filter-based waveform model for statistical parametric speech synthesis

no code implementations29 Oct 2018 Xin Wang, Shinji Takaki, Junichi Yamagishi

Neural waveform models such as the WaveNet are used in many recent text-to-speech systems, but the original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure.

Speech Synthesis

Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics

no code implementations29 Oct 2018 Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen

Transforming the facial and acoustic features together makes it possible for the converted voice and facial expressions to be highly correlated and for the generated target speaker to appear and sound natural.

Image Reconstruction

Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning

no code implementations7 Nov 2018 Xin Wang, Jiawei Wu, Da Zhang, Yu Su, William Yang Wang

Although promising results have been achieved in video captioning, existing models are limited to the fixed inventory of activities in the training corpus, and do not generalize to open vocabulary scenarios.

Video Captioning

Guided Feature Selection for Deep Visual Odometry

no code implementations25 Nov 2018 Fei Xue, Qiuyuan Wang, Xin Wang, Wei Dong, Junqiu Wang, Hongbin Zha

We present a novel end-to-end visual odometry architecture with guided feature selection based on deep convolutional recurrent neural networks.

feature selection Monocular Visual Odometry +1

MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment

no code implementations CVPR 2019 Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis

In this paper, we present Moment Alignment Network (MAN), a novel framework that unifies the candidate moment encoding and temporal structural reasoning in a single-shot feed-forward network.

Moment Retrieval Natural Language Moment Retrieval +1

Conditional Graph Neural Processes: A Functional Autoencoder Approach

no code implementations13 Dec 2018 Marcel Nassar, Xin Wang, Evren Tumer

Thus, we refer to our model as Conditional Graph Neural Process (CGNP).

Explanatory Graphs for CNNs

no code implementations18 Dec 2018 Quanshi Zhang, Xin Wang, Ruiming Cao, Ying Nian Wu, Feng Shi, Song-Chun Zhu

This paper introduces a graphical model, namely an explanatory graph, which reveals the knowledge hierarchy hidden inside conv-layers of a pre-trained CNN.

Object

Group Linguistic Bias Aware Neural Response Generation

no code implementations WS 2017 Jianan Wang, Xin Wang, Fang Li, Zhen Xu, Zhuoran Wang, Baoxun Wang

For practical chatbots, one of the essential factor for improving user experience is the capability of customizing the talking style of the agents, that is, to make chatbots provide responses meeting users{'} preference on language styles, topics, etc.

Response Generation

On Algorithms for Sparse Multi-factor NMF

no code implementations NeurIPS 2013 Siwei Lyu, Xin Wang

Nonnegative matrix factorization (NMF) is a popular data analysis method, the objective of which is to decompose a matrix with all nonnegative components into the product of two other nonnegative matrices.

Interpretable CNNs for Object Classification

no code implementations8 Jan 2019 Quanshi Zhang, Xin Wang, Ying Nian Wu, Huilin Zhou, Song-Chun Zhu

This paper proposes a generic method to learn interpretable convolutional filters in a deep convolutional neural network (CNN) for object classification, where each interpretable filter encodes features of a specific object part.

Classification General Classification +1

Residual Attention based Network for Hand Bone Age Assessment

no code implementations21 Dec 2018 Eric Wu, Bin Kong, Xin Wang, Junjie Bai, Yi Lu, Feng Gao, Shaoting Zhang, Kunlin Cao, Qi Song, Siwei Lyu, Youbing Yin

The hierarchical attention components of the residual attention subnet force our network to focus on the key components of the X-ray images and generate the final predictions as well as the associated visual supports, which is similar to the assessment procedure of clinicians.

Hand Segmentation

Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models

no code implementations ACL 2019 Dinghan Shen, Asli Celikyilmaz, Yizhe Zhang, Liqun Chen, Xin Wang, Jianfeng Gao, Lawrence Carin

Variational autoencoders (VAEs) have received much attention recently as an end-to-end architecture for text generation with latent variables.

Sentence Text Generation

Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

no code implementations15 Feb 2019 Hesham Mostafa, Xin Wang

We evaluate the performance of dynamic reallocation methods in training deep convolutional networks and show that our method outperforms previous static and dynamic reparameterization methods, yielding the best accuracy for a fixed parameter budget, on par with accuracies obtained by iteratively pruning a pre-trained dense model.

Attention-driven Tree-structured Convolutional LSTM for High Dimensional Data Understanding

no code implementations29 Jan 2019 Bin Kong, Xin Wang, Junjie Bai, Yi Lu, Feng Gao, Kunlin Cao, Qi Song, Shaoting Zhang, Siwei Lyu, Youbing Yin

In order to address these limitations, we present tree-structured ConvLSTM models for tree-structured image analysis tasks which can be trained end-to-end.

Vocal Bursts Intensity Prediction

DeepCenterline: a Multi-task Fully Convolutional Network for Centerline Extraction

no code implementations25 Mar 2019 Zhihui Guo, Junjie Bai, Yi Lu, Xin Wang, Kunlin Cao, Qi Song, Milan Sonka, Youbing Yin

The proposed method generates well-positioned centerlines, exhibiting lower number of missing branches and is more robust in the presence of minor imperfections of the object segmentation mask.

Object Semantic Segmentation

Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

no code implementations29 Mar 2019 Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi

We propose using an extended model architecture of Tacotron, that is a multi-source sequence-to-sequence model with a dual attention mechanism as the shared model for both the TTS and VC tasks.

Speech Synthesis Voice Conversion

Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora

no code implementations1 Apr 2019 Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa

When the available data of a target speaker is insufficient to train a high quality speaker-dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers and train a multi-speaker TTS model instead.

Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation

no code implementations NAACL 2019 Jiawei Wu, Xin Wang, William Yang Wang

The overreliance on large parallel corpora significantly limits the applicability of machine translation systems to the majority of language pairs.

Sentence Translation +1

ACE: Adapting to Changing Environments for Semantic Segmentation

no code implementations ICCV 2019 Zuxuan Wu, Xin Wang, Joseph E. Gonzalez, Tom Goldstein, Larry S. Davis

However, neural classifiers are often extremely brittle when confronted with domain shift---changes in the input distribution that occur over time.

Meta-Learning Semantic Segmentation

Maximum Correntropy Criterion with Variable Center

no code implementations13 Apr 2019 Badong Chen, Xin Wang, Yingsong Li, Jose C. Principe

The kernel function in correntropy is usually restricted to the Gaussian function with center located at zero.

Position

Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction

no code implementations29 Apr 2019 Mingming Sun, Xu Li, Xin Wang, Miao Fan, Yue Feng, Ping Li

In this paper, we consider the problem of open information extraction (OIE) for extracting entity and relation level intermediate structures from sentences in open-domain.

Attribute Open Information Extraction +3

Neural source-filter waveform models for statistical parametric speech synthesis

no code implementations27 Apr 2019 Xin Wang, Shinji Takaki, Junichi Yamagishi

Other models such as Parallel WaveNet and ClariNet bring together the benefits of AR and IAF-based models and train an IAF model by transferring the knowledge from a pre-trained AR teacher to an IAF student without any sequential transformation.

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.