Search Results for author: Yong Xu

Found 148 papers, 74 papers with code

FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction

1 code implementation28 May 2024 Zhonghang Li, Lianghao Xia, Yong Xu, Chao Huang

Additionally, we incorporate a distribution mapping mechanism to align the data distributions of pre-training and downstream data, facilitating effective knowledge transfer in spatio-temporal forecasting.

FPDIoU Loss: A Loss Function for Efficient Bounding Box Regression of Rotated Object Detection

no code implementations16 May 2024 Siliang Ma, Yong Xu

In order to improve the efficiency and accuracy of bounding box regression for rotated object detection, we proposed a novel metric for arbitrary shapes comparison based on minimum points distance, which takes most of the factors from existing loss functions for rotated object detection into account, i. e., the overlap or nonoverlapping area, the central points distance and the rotation angle.

Object object-detection +4

Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

no code implementations NeurIPS 2023 Chengliang Liu, Jie Wen, Yabo Liu, Chao Huang, Zhihao Wu, Xiaoling Luo, Yong Xu

Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages.

Multi-Label Classification Multi-Label Learning +1

CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

no code implementations28 Mar 2024 Jie Wen, Zheng Zhang, Yong Xu, Bob Zhang, Lunke Fei, Guo-Sen Xie

In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues.

Clustering Graph Embedding +1

AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework

1 code implementation19 Mar 2024 Xiang Li, Zhenyu Li, Chen Shi, Yong Xu, Qing Du, Mingkui Tan, Jun Huang, Wei Lin

The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering.

Benchmarking Question Answering +2

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

no code implementations8 Mar 2024 Liting Lin, Heng Fan, Zhipeng Zhang, YaoWei Wang, Yong Xu, Haibin Ling

The shared embeddings, which describe the absolute coordinates of multi-resolution images (namely, the template and search images), are inherited from the pre-trained backbones.

Inductive Bias Position +1

UrbanGPT: Spatio-Temporal Large Language Models

1 code implementation25 Feb 2024 Zhonghang Li, Lianghao Xia, Jiabin Tang, Yong Xu, Lei Shi, Long Xia, Dawei Yin, Chao Huang

These findings highlight the potential of building large language models for spatio-temporal learning, particularly in zero-shot scenarios where labeled data is scarce.

Unsupervised Sign Language Translation and Generation

no code implementations12 Feb 2024 Zhengsheng Guo, Zhiwei He, Wenxiang Jiao, Xing Wang, Rui Wang, Kehai Chen, Zhaopeng Tu, Yong Xu, Min Zhang

Motivated by the success of unsupervised neural machine translation (UNMT), we introduce an unsupervised sign language translation and generation network (USLNet), which learns from abundant single-modality (text and video) data without parallel sign language data.

Machine Translation Sign Language Translation +1

3D Shape Completion on Unseen Categories:A Weakly-supervised Approach

no code implementations19 Jan 2024 Lintai Wu, Junhui Hou, Linqi Song, Yong Xu

Specifically, we construct a prior bank consisting of representative shapes from the seen categories.

TaskWeaver: A Code-First Agent Framework

1 code implementation29 Nov 2023 Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue Zhang, Lu Wang, Minghua Ma, Pu Zhao, Si Qin, Xiaoting Qin, Chao Du, Yong Xu, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang

TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic.

Natural Language Understanding

Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation

1 code implementation7 Nov 2023 Ruomeng Ding, Chaoyun Zhang, Lu Wang, Yong Xu, Minghua Ma, Wei zhang, Si Qin, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

To address these limitations, we introduce a novel thought prompting approach called "Everything of Thoughts" (XoT) to defy the law of "Penrose triangle of existing thought paradigms.

Decision Making

GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks

1 code implementation NeurIPS 2023 Zhonghang Li, Lianghao Xia, Yong Xu, Chao Huang

This strategy guides the mask autoencoder in learning robust spatio-temporal representations and facilitates the modeling of different relationships, ranging from intra-cluster to inter-cluster, in an easy-to-hard training manner.

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

no code implementations2 Oct 2023 Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs.

Denoising Self-Supervised Learning +2

Cross-Modal Vertical Federated Learning for MRI Reconstruction

no code implementations5 Jun 2023 Yunlu Yan, Hong Wang, Yawen Huang, Nanjun He, Lei Zhu, Yuexiang Li, Yong Xu, Yefeng Zheng

To this end, we formulate this practical-yet-challenging cross-modal vertical federated learning task, in which shape data from multiple hospitals have different modalities with a small amount of multi-modality data collected from the same individuals.

Disentanglement MRI Reconstruction +1

Graph Transformer for Recommendation

1 code implementation4 Jun 2023 Chaoliu Li, Lianghao Xia, Xubin Ren, Yaowen Ye, Yong Xu, Chao Huang

This paper presents a novel approach to representation learning in recommender systems by integrating generative self-supervised learning with graph transformer architecture.

Collaborative Filtering Data Augmentation +3

Design and Implementation of Emergency Simulated Lighting System Based on Tello UAV

no code implementations15 May 2023 Yexin Pan, Yong Xu, Bo Ma, Chuanhuang Li

Third, the flight control module has designed a specialized command control framework based on Tello UAV's API, which converts the planned flight path into command statements, forms flight text, and controls the flight of unmanned aerial vehicles accordingly.

LMEye: An Interactive Perception Network for Large Language Models

1 code implementation5 May 2023 Yunxin Li, Baotian Hu, Xinyu Chen, Lin Ma, Yong Xu, Min Zhang

LMEye addresses this issue by allowing the LLM to request the desired visual information aligned with various human instructions, which we term as the dynamic visual information interaction.

Language Modelling Large Language Model +1

Information Recovery-Driven Deep Incomplete Multiview Clustering Network

2 code implementations2 Apr 2023 Chengliang Liu, Jie Wen, Zhihao Wu, Xiaoling Luo, Chao Huang, Yong Xu

Concretely, a two-stage autoencoder network with the self-attention structure is built to synchronously extract high-level semantic representations of multiple views and recover the missing data.

Clustering Graph Reconstruction +3

Learning Reliable Representations for Incomplete Multi-View Partial Multi-Label Classification

no code implementations30 Mar 2023 Chengliang Liu, Jie Wen, Yong Xu, Liqiang Nie, Min Zhang

The application of multi-view contrastive learning has further facilitated this process, however, the existing multi-view contrastive learning methods crudely separate the so-called negative pair, which largely results in the separation of samples belonging to the same category or similar ones.

Classification Contrastive Learning +3

DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

2 code implementations15 Mar 2023 Chengliang Liu, Jie Wen, Xiaoling Luo, Chao Huang, Zhihao Wu, Yong Xu

To deal with the double incomplete multi-view multi-label classification problem, we propose a deep instance-level contrastive network, namely DICNet.

Contrastive Learning Missing Labels

Graph-less Collaborative Filtering

1 code implementation15 Mar 2023 Lianghao Xia, Chao Huang, Jiao Shi, Yong Xu

Motivated by these limitations, we propose a simple and effective collaborative filtering model (SimRec) that marries the power of knowledge distillation and contrastive learning.

Collaborative Filtering Contrastive Learning +2

Disentangled Graph Social Recommendation

1 code implementation14 Mar 2023 Lianghao Xia, Yizhen Shao, Chao Huang, Yong Xu, Huance Xu, Jian Pei

In this work, we design a Disentangled Graph Neural Network (DGNN) with the integration of latent memory units, which empowers DGNN to maintain factorized representations for heterogeneous types of user and item connections.

Recommendation Systems

Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

1 code implementation13 Mar 2023 Chengliang Liu, Jie Wen, Xiaoling Luo, Yong Xu

The former aggregates information from different views in the process of extracting view-specific features, and the latter learns subcategory embedding to improve classification performance.

Multi-Label Classification Multi-Label Learning +1

Heterogeneous Graph Contrastive Learning for Recommendation

1 code implementation2 Mar 2023 Mengru Chen, Chao Huang, Lianghao Xia, Wei Wei, Yong Xu, Ronghua Luo

In light of this, we propose a Heterogeneous Graph Contrastive Learning (HGCL), which is able to incorporate heterogeneous relational semantics into the user-item interaction modeling with contrastive learning-enhanced knowledge transfer across different views.

Contrastive Learning Recommendation Systems +3

Multi-Behavior Graph Neural Networks for Recommender System

no code implementations17 Feb 2023 Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Liefeng Bo

Recent years have witnessed the emerging success of many deep learning-based recommendation models for augmenting collaborative filtering architectures with various neural network architectures, such as multi-layer perceptron and autoencoder.

Collaborative Filtering Recommendation Systems +1

CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection

no code implementations CVPR 2023 Yabo Liu, Jinghua Wang, Chao Huang, YaoWei Wang, Yong Xu

To overcome these problems, we propose a cross-modality graph reasoning adaptation (CIGAR) method to take advantage of both visual and linguistic knowledge.

Graph Matching object-detection +1

Highly Confident Local Structure Based Consensus Graph Learning for Incomplete Multi-View Clustering

1 code implementation CVPR 2023 Jie Wen, Chengliang Liu, Gehui Xu, Zhihao Wu, Chao Huang, Lunke Fei, Yong Xu

Graph-based multi-view clustering has attracted extensive attention because of the powerful clustering-structure representation ability and noise robustness.

Clustering Graph Learning +1

Coherent Event Guided Low-Light Video Enhancement

no code implementations ICCV 2023 Jinxiu Liang, Yixin Yang, Boyu Li, Peiqi Duan, Yong Xu, Boxin Shi

With frame-based cameras, capturing fast-moving scenes without suffering from blur often comes at the cost of low SNR and low contrast.

Video Enhancement

Universal Object Detection with Large Vision Model

1 code implementation19 Dec 2022 Feng Lin, Wenze Hu, YaoWei Wang, Yonghong Tian, Guangming Lu, Fanglin Chen, Yong Xu, Xiaoyu Wang

In this study, our focus is on a specific challenge: the large-scale, multi-domain universal object detection problem, which contributes to the broader goal of achieving a universal vision system.

Object object-detection +1

Leveraging Single-View Images for Unsupervised 3D Point Cloud Completion

1 code implementation1 Dec 2022 Lintai Wu, Qijian Zhang, Junhui Hou, Yong Xu

The experimental results of our method are superior to those of the state-of-the-art unsupervised methods by a large margin.

Point Cloud Completion

Deep Neural Mel-Subband Beamformer for In-car Speech Separation

no code implementations22 Nov 2022 Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use.

Speech Separation

Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition

1 code implementation14 Nov 2022 Jiaxin Ye, Xin-Cheng Wen, Yujie Wei, Yong Xu, KunHong Liu, Hongming Shan

Specifically, TIM-Net first employs temporal-aware blocks to learn temporal affective representation, then integrates complementary information from the past and the future to enrich contextual representations, and finally, fuses multiple time scale features for better adaptation to the emotional variation.

Speech Emotion Recognition

GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition

1 code implementation28 Oct 2022 Jia-Xin Ye, Xin-Cheng Wen, Xuan-Ze Wang, Yong Xu, Yan Luo, Chang-Li Wu, Li-Yan Chen, Kun-Hong Liu

In this paper, we propose a Gated Multi-scale Temporal Convolutional Network (GM-TCNet) to construct a novel emotional causality representation learning component with a multi-scale receptive field.

Representation Learning Speech Emotion Recognition

Prompt-driven efficient Open-set Semi-supervised Learning

no code implementations28 Sep 2022 Haoran Li, Chun-Mei Feng, Tao Zhou, Yong Xu, Xiaojun Chang

In this paper, we propose a prompt-driven efficient OSSL framework, called OpenPrompt, which can propagate class information from labeled to unlabeled data with only a small number of trainable parameters.

Computational Efficiency Outlier Detection

A Survey on Incomplete Multi-view Clustering

1 code implementation17 Aug 2022 Jie Wen, Zheng Zhang, Lunke Fei, Bob Zhang, Yong Xu, Zhao Zhang, Jinxing Li

However, in practical applications, such as disease diagnosis, multimedia analysis, and recommendation system, it is common to observe that not all views of samples are available in many cases, which leads to the failure of the conventional multi-view clustering methods.

Clustering Incomplete multi-view clustering

Localized Sparse Incomplete Multi-view Clustering

1 code implementation5 Aug 2022 Chengliang Liu, Zhihao Wu, Jie Wen, Chao Huang, Yong Xu

Moreover, a novel local graph embedding term is introduced to learn the structured consensus representation.

Clustering Graph Embedding +2

CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for the Single-Corpus and Cross-Corpus Speech Emotion Recognition

no code implementations18 Jul 2022 Xin-Cheng Wen, Jia-Xin Ye, Yan Luo, Yong Xu, Xuan-Ze Wang, Chang-Li Wu, Kun-Hong Liu

For the single-corpus task, the combination of Convolution-Pooling and Attention CapsNet module CPAC) is designed by embedding the self-attention mechanism to the CapsNet, guiding the module to focus on the important features that can be fed into different capsules.

Cross-corpus Speech Emotion Recognition +1

Multi-Behavior Sequential Recommendation with Temporal Graph Transformer

1 code implementation6 Jun 2022 Lianghao Xia, Chao Huang, Yong Xu, Jian Pei

The new TGT method endows the sequential recommendation architecture to distill dedicated knowledge for type-specific behavior relational context and the implicit behavior dependencies.

Sequential Recommendation

Deniable Steganography

no code implementations25 May 2022 Yong Xu, Zhihua Xia, Zichi Wang, Xinpeng Zhang, Jian Weng

With a stego media discovered, the adversary could find out the sender or receiver and coerce them to disclose the secret message, which we name as coercive attack in this paper.


NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

no code implementations20 May 2022 Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.

Acoustic echo cancellation Speech Enhancement +2

Hypergraph Contrastive Collaborative Filtering

1 code implementation26 Apr 2022 Lianghao Xia, Chao Huang, Yong Xu, Jiashu Zhao, Dawei Yin, Jimmy Xiangji Huang

Additionally, our HCCF model effectively integrates the hypergraph structure encoding with self-supervised learning to reinforce the representation quality of recommender systems, based on the hypergraph-enhanced self-discrimination.

Collaborative Filtering Contrastive Learning +2

Spatial-Temporal Hypergraph Self-Supervised Learning for Crime Prediction

1 code implementation18 Apr 2022 Zhonghang Li, Chao Huang, Lianghao Xia, Yong Xu, Jian Pei

Crime has become a major concern in many cities, which calls for the rising demand for timely predicting citywide crime occurrence.

Crime Prediction Decision Making +1

Global-Supervised Contrastive Loss and View-Aware-Based Post-Processing for Vehicle Re-Identification

no code implementations17 Apr 2022 Zhijun Hu, Yong Xu, Jie Wen, Xianjing Cheng, Zaijun Zhang, Lilei Sun, YaoWei Wang

The proposed VABPP method is the first time that the view-aware-based method is used as a post-processing method in the field of vehicle re-identification.

Attribute Vehicle Re-Identification

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

1 code implementation31 Mar 2022 Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu

In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting.

Decoder speaker-diarization +2

Fine-Grained Object Classification via Self-Supervised Pose Alignment

2 code implementations CVPR 2022 Xuhui Yang, YaoWei Wang, Ke Chen, Yong Xu, Yonghong Tian

Semantic patterns of fine-grained objects are determined by subtle appearance difference of local parts, which thus inspires a number of part-based methods.

Classification Object +1

Contrastive Meta Learning with Behavior Multiplicity for Recommendation

1 code implementation17 Feb 2022 Wei Wei, Chao Huang, Lianghao Xia, Yong Xu, Jiashu Zhao, Dawei Yin

In addition, to capture the diverse multi-behavior patterns, we design a contrastive meta network to encode the customized behavior heterogeneity for different users.

Contrastive Learning Meta-Learning

Collaborative Reflection-Augmented Autoencoder Network for Recommender Systems

1 code implementation10 Jan 2022 Lianghao Xia, Chao Huang, Yong Xu, Huance Xu, Xiang Li, WeiGuo Zhang

As the deep learning techniques have expanded to real-world recommendation tasks, many deep neural network based Collaborative Filtering (CF) models have been developed to project user-item interactions into latent feature space, based on various neural architectures, such as multi-layer perceptron, auto-encoder and graph neural networks.

Collaborative Filtering Recommendation Systems

Multi-Behavior Enhanced Recommendation with Cross-Interaction Collaborative Relation Modeling

1 code implementation7 Jan 2022 Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Mengyin Lu, Liefeng Bo

Due to the overlook of user's multi-behavioral patterns over different items, existing recommendation methods are insufficient to capture heterogeneous collaborative signals from user multi-behavior data.

Collaborative Filtering Recommendation Systems +1

Spatial-Temporal Sequential Hypergraph Network for Crime Prediction with Dynamic Multiplex Relation Learning

1 code implementation IJCAI 2021 Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Liefeng Bo, Xiyue Zhang, Tianyi Chen

Crime prediction is crucial for public safety and resource optimization, yet is very challenging due to two aspects: i) the dynamics of criminal patterns across time and space, crime events are distributed unevenly on both spatial and temporal domains; ii) time-evolving dependencies between different types of crimes (e. g., Theft, Robbery, Assault, Damage) which reveal fine-grained semantics of crimes.

Crime Prediction Relation

SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis

no code implementations CVPR 2022 Tianyi Chen, Yunfei Zhang, Xiaoyang Huo, Si Wu, Yong Xu, Hau San Wong

To reduce the dependence of generative models on labeled data, we propose a semi-supervised hyper-spherical GAN for class-conditional fine-grained image generation, and our model is referred to as SphericGAN.

Generative Adversarial Network Image Generation

Specificity-Preserving Federated Learning for MR Image Reconstruction

1 code implementation9 Dec 2021 Chun-Mei Feng, Yunlu Yan, Shanshan Wang, Yong Xu, Ling Shao, Huazhu Fu

The core idea is to divide the MR reconstruction model into two parts: a globally shared encoder to obtain a generalized representation at the global level, and a client-specific decoder to preserve the domain-specific properties of each client, which is important for collaborative reconstruction when the clients have unique distribution.

Federated Learning Image Reconstruction +1

Encoding Spatial Distribution of Convolutional Features for Texture Representation

1 code implementation NeurIPS 2021 Yong Xu, Feng Li, Zhile Chen, Jinxiu Liang, Yuhui Quan

Existing convolutional neural networks (CNNs) often use global average pooling (GAP) to aggregate feature maps into a single representation.

Material Recognition Retrieval +1

Joint Neural AEC and Beamforming with Double-Talk Detection

no code implementations9 Nov 2021 Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

We train the proposed model in an end-to-end approach to eliminate background noise and echoes from far-end audio devices, which include nonlinear distortions.

Acoustic echo cancellation Denoising +2

Deep multi-modal aggregation network for MR image reconstruction with auxiliary modality

2 code implementations15 Oct 2021 Chun-Mei Feng, Huazhu Fu, Tianfei Zhou, Yong Xu, Ling Shao, David Zhang

Magnetic resonance (MR) imaging produces detailed images of organs and tissues with better contrast, but it suffers from a long acquisition time, which makes the image quality vulnerable to say motion artifacts.

Image Reconstruction

Social Recommendation with Self-Supervised Metagraph Informax Network

1 code implementation8 Oct 2021 Xiaoling Long, Chao Huang, Yong Xu, Huance Xu, Peng Dai, Lianghao Xia, Liefeng Bo

To model relation heterogeneity, we design a metapath-guided heterogeneous graph neural network to aggregate feature embeddings from different types of meta-relations across users and items, em-powering SMIN to maintain dedicated representations for multi-faceted user- and item-wise dependencies.

Collaborative Filtering Recommendation Systems

Graph Meta Network for Multi-Behavior Recommendation

1 code implementation8 Oct 2021 Lianghao Xia, Yong Xu, Chao Huang, Peng Dai, Liefeng Bo

Modern recommender systems often embed users and items into low-dimensional latent representations, based on their observed interactions.

Meta-Learning Recommendation Systems +1

Multiplex Behavioral Relation Learning for Recommendation via Memory Augmented Transformer Network

1 code implementation8 Oct 2021 Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Bo Zhang, Liefeng Bo

The overlook of multiplex behavior relations can hardly recognize the multi-modal contextual signals across different types of interactions, which limit the feasibility of current recommendation methods.

Recommendation Systems Relation +1

Global Context Enhanced Social Recommendation with Hierarchical Graph Neural Networks

1 code implementation8 Oct 2021 Huance Xu, Chao Huang, Yong Xu, Lianghao Xia, Hao Xing, Dawei Yin

Social recommendation which aims to leverage social connections among users to enhance the recommendation performance.

Recommendation Systems

Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network

1 code implementation8 Oct 2021 Xiyue Zhang, Chao Huang, Yong Xu, Lianghao Xia, Peng Dai, Liefeng Bo, Junbo Zhang, Yu Zheng

Accurate forecasting of citywide traffic flow has been playing critical role in a variety of spatial-temporal mining applications, such as intelligent traffic control and public risk assessment.

Traffic Prediction

Knowledge-Enhanced Hierarchical Graph Transformer Network for Multi-Behavior Recommendation

1 code implementation8 Oct 2021 Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Xiyue Zhang, Hongsheng Yang, Jian Pei, Liefeng Bo

In particular: i) complex inter-dependencies across different types of user behaviors; ii) the incorporation of knowledge-aware item relations into the multi-behavior recommendation framework; iii) dynamic characteristics of multi-typed user-item interactions.

Graph Attention Recommendation Systems

Graph-Enhanced Multi-Task Learning of Multi-Level Transition Dynamics for Session-based Recommendation

1 code implementation8 Oct 2021 Chao Huang, Jiahui Chen, Lianghao Xia, Yong Xu, Peng Dai, Yanqing Chen, Liefeng Bo, Jiashu Zhao, Jimmy Xiangji Huang

The learning process of intra- and inter-session transition dynamics are integrated, to preserve the underlying low- and high-level item relationships in a common latent space.

Multi-Task Learning Relation +1

Knowledge-aware Coupled Graph Neural Network for Social Recommendation

1 code implementation8 Oct 2021 Chao Huang, Huance Xu, Yong Xu, Peng Dai, Lianghao Xia, Mengyin Lu, Liefeng Bo, Hao Xing, Xiaoping Lai, Yanfang Ye

While many recent efforts show the effectiveness of neural network-based social recommender systems, several important challenges have not been well addressed yet: (i) The majority of models only consider users' social connections, while ignoring the inter-dependent knowledge across items; (ii) Most of existing solutions are designed for singular type of user-item interactions, making them infeasible to capture the interaction heterogeneity; (iii) The dynamic nature of user-item interactions has been less explored in many social-aware recommendation techniques.

Collaborative Filtering Recommendation Systems

Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

1 code implementation3 Sep 2021 Chun-Mei Feng, Yunlu Yan, Kai Yu, Yong Xu, Ling Shao, Huazhu Fu

Our SANet could explore the areas of high-intensity and low-intensity regions in the "forward" and "reverse" directions with the help of the auxiliary contrast, while learning clearer anatomical structure and edge information for the SR of a target-contrast MR image.

Image Super-Resolution

Heterogeneous relational message passing networks for molecular dynamics simulations

no code implementations2 Sep 2021 Zun Wang, Chong Wang, Sibo Zhao, Yong Xu, Shaogang Hao, Chang Yu Hsieh, Bing-Lin Gu, Wenhui Duan

With many frameworks based on message passing neural networks proposed to predict molecular and bulk properties, machine learning methods have tremendously shifted the paradigms of computational sciences underpinning physics, material science, chemistry, and biology.

BIG-bench Machine Learning

Fully Non-Homogeneous Atmospheric Scattering Modeling with Convolutional Neural Networks for Single Image Dehazing

no code implementations25 Aug 2021 Cong Wang, Yan Huang, Yuexian Zou, Yong Xu

However, it is noted that ASM-based SIDM degrades its performance in dehazing real world hazy images due to the limited modelling ability of ASM where the atmospheric light factor (ALF) and the angular scattering coefficient (ASC) are assumed as constants for one image.

Image Dehazing Single Image Dehazing

Multi-Modal Transformer for Accelerated MR Imaging

1 code implementation27 Jun 2021 Chun-Mei Feng, Yunlu Yan, Geng Chen, Yong Xu, Ling Shao, Huazhu Fu

To this end, we propose a multi-modal transformer (MTrans), which is capable of transferring multi-scale features from the target modality to the auxiliary modality, for accelerated MR imaging.

Image Reconstruction Super-Resolution

Dual-Stream Reciprocal Disentanglement Learning for Domain Adaptation Person Re-Identification

1 code implementation26 Jun 2021 Huafeng Li, Kaixiong Xu, Jinxing Li, Guangming Lu, Yong Xu, Zhengtao Yu, David Zhang

Since human-labeled samples are free for the target set, unsupervised person re-identification (Re-ID) has attracted much attention in recent years, by additionally exploiting the source set.

Disentanglement Domain Adaptation +2

Deep Texture Recognition via Exploiting Cross-Layer Statistical Self-Similarity

no code implementations CVPR 2021 Zhile Chen, Feng Li, Yuhui Quan, Yong Xu, Hui Ji

In recent years, convolutional neural networks (CNNs) have become a prominent tool for texture recognition.

Multi-Contrast MRI Super-Resolution via a Multi-Stage Integration Network

1 code implementation19 May 2021 Chun-Mei Feng, Huazhu Fu, Shuhao Yuan, Yong Xu

In this work, we propose a multi-stage integration network (i. e., MINet) for multi-contrast MRI SR, which explicitly models the dependencies between multi-contrast images at different stages to guide image SR.


DONet: Dual-Octave Network for Fast MR Image Reconstruction

no code implementations12 May 2021 Chun-Mei Feng, Zhanyuan Yang, Huazhu Fu, Yong Xu, Jian Yang, Ling Shao

In this paper, we propose the Dual-Octave Network (DONet), which is capable of learning multi-scale spatial-frequency features from both the real and imaginary components of MR data, for fast parallel MR image reconstruction.

Image Reconstruction

Dual-Octave Convolution for Accelerated Parallel MR Image Reconstruction

1 code implementation12 Apr 2021 Chun-Mei Feng, Zhanyuan Yang, Geng Chen, Yong Xu, Ling Shao

We evaluate the performance of the proposed model on the acceleration of multi-coil MR image reconstruction.

Image Reconstruction

MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment

no code implementations2 Apr 2021 Meng Yu, Chunlei Zhang, Yong Xu, ShiXiong Zhang, Dong Yu

The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests.

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation

no code implementations31 Mar 2021 Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu

In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.

Room Impulse Response (RIR) Speech Dereverberation

Asymmetric CNN for image super-resolution

1 code implementation25 Mar 2021 Chunwei Tian, Yong Xu, WangMeng Zuo, Chia-Wen Lin, David Zhang

In this paper, we propose an asymmetric CNN (ACNet) comprising an asymmetric block (AB), a memory enhancement block (MEB) and a high-frequency feature enhancement block (HFFEB) for image super-resolution.

Image Super-Resolution

Distributed Newton Optimization with Maximized Convergence Rate

no code implementations17 Feb 2021 Damián Marelli, Yong Xu, Minyue Fu, Zenghong Huang

As the second step towards our goal we complement the proposed method with a fully distributed method for estimating the optimal step size that maximizes convergence speed.

Distributed Optimization Optimization and Control

MultiFace: A Generic Training Mechanism for Boosting Face Recognition Performance

1 code implementation25 Jan 2021 Jing Xu, Tszhang Guo, Yong Xu, Zenglin Xu, Kun Bai

Deep Convolutional Neural Networks (DCNNs) and their variants have been widely used in large scale face recognition(FR) recently.

Clustering Descriptive +1

Field-free spin-orbit torque-induced switching of perpendicular magnetization in a ferrimagnetic layer with vertical composition gradient

no code implementations21 Jan 2021 Zhenyi Zheng, Yue Zhang, Victor Lopez-Dominguez, Luis Sánchez-Tejerina, Jiacheng Shi, Xueqiang Feng, Lei Chen, Zilu Wang, Zhizhong Zhang, Kun Zhang, Bin Hong, Yong Xu, Youguang Zhang, Mario Carpentieri, Albert Fert, Giovanni Finocchio, Weisheng Zhao, Pedram Khalili Amiri

Existing methods to do so involve the application of an in-plane bias magnetic field, or incorporation of in-plane structural asymmetry in the device, both of which can be difficult to implement in practical applications.

Mesoscale and Nanoscale Physics

FWB-Net:Front White Balance Network for Color Shift Correction in Single Image Dehazing via Atmospheric Light Estimation

no code implementations21 Jan 2021 Cong Wang, Yan Huang, Yuexian Zou, Yong Xu

However, for images taken in real-world, the illumination is not uniformly distributed over whole image which brings model mismatch and possibly results in color shift of the deep models using ASM.

Image Dehazing Single Image Dehazing

Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation

no code implementations ICCV 2021 Tianyi Chen, Yi Liu, Yunfei Zhang, Si Wu, Yong Xu, Feng Liangbing, Hau San Wong

To ensure disentanglement among the variables, we maximize mutual information between the class-independent variable and synthesized images, map real images to the latent space of a generator to perform consistency regularization of cross-class attributes, and incorporate class semantic-based regularization into a discriminator's feature space.

Disentanglement Image Generation

Hypergraph Neural Networks for Hypergraph Matching

1 code implementation ICCV 2021 Xiaowei Liao, Yong Xu, Haibin Ling

Specifically, given two hypergraphs to be matched, we first construct an association hypergraph over them and convert the hypergraph matching problem into a node classification problem on the association hypergraph.

Graph Matching Hypergraph Matching +1

Detection of magnetic gap in the topological surface states of MnBi2Te4

no code implementations31 Dec 2020 Haoran Ji, Yanzhao Liu, He Wang, Jiawei Luo, Jiaheng Li, Hao Li, Yang Wu, Yong Xu, Jian Wang

An essential ingredient to realize these quantum states is the magnetic gap in the topological surface states induced by the out-of-plane ferromagnetism on the surface of MnBi2Te4.

Materials Science

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

no code implementations24 Dec 2020 Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Donald S. Williamson, Dong Yu

Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Vehicle Re-identification Based on Dual Distance Center Loss

no code implementations23 Dec 2020 Zhijun Hu, Yong Xu, Jie Wen, Lilei Sun, Raja S P

Moreover, by designing a Euclidean distance threshold between all center pairs, which not only strengthens the inter-class separability of center loss, but also makes the center loss (or DDCL) works well without the combination of softmax loss.

Person Re-Identification Vehicle Re-Identification

Structural Disorder Induced Second-order Topological Insulators in Three Dimensions

no code implementations22 Dec 2020 Jiong-Hao Wang, Yan-Bin Yang, Ning Dai, Yong Xu

Here we predict the existence of a secondorder topological insulating phase in an amorphous system without any crystalline symmetry.

Mesoscale and Nanoscale Physics Disordered Systems and Neural Networks

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

no code implementations30 Oct 2020 Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu

The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

LaSOT: A High-quality Large-scale Single Object Tracking Benchmark

1 code implementation8 Sep 2020 Heng Fan, Hexin Bai, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Harshit, Mingzhen Huang, Juehuan Liu, Yong Xu, Chunyuan Liao, Lin Yuan, Haibin Ling

The average video length of LaSOT is around 2, 500 frames, where each video contains various challenge factors that exist in real world video footage, such as the targets disappearing and re-appearing.

Object Tracking Visual Tracking +1

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

1 code implementation21 Aug 2020 Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources.

Speech Enhancement Speech Separation

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

1 code implementation16 Aug 2020 Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Dong Yu

Speech separation algorithms are often used to separate the target speech from other interfering sources.

Speech Separation

Recurrent Exposure Generation for Low-Light Face Detection

1 code implementation21 Jul 2020 Jinxiu Liang, Jingwen Wang, Yuhui Quan, Tianyi Chen, Jiaying Liu, Haibin Ling, Yong Xu

REG produces progressively and efficiently intermediate images corresponding to various exposure settings, and such pseudo-exposures are then fused by MED to detect faces across different lighting conditions.

Face Detection Image Enhancement

Lightweight image super-resolution with enhanced CNN

1 code implementation8 Jul 2020 Chunwei Tian, Ruibin Zhuge, Zhihao Wu, Yong Xu, WangMeng Zuo, Chen Chen, Chia-Wen Lin

Finally, the IRB uses coarse high-frequency features from the RB to learn more accurate SR features and construct a SR image.

Image Super-Resolution

Designing and Training of A Dual CNN for Image Denoising

1 code implementation8 Jul 2020 Chunwei Tian, Yong Xu, WangMeng Zuo, Bo Du, Chia-Wen Lin, David Zhang

The enhancement block gathers and fuses the global and local features to provide complementary information for the latter network.

Image Denoising

Deep Bilateral Retinex for Low-Light Image Enhancement

no code implementations4 Jul 2020 Jinxiu Liang, Yong Xu, Yuhui Quan, Jingwen Wang, Haibin Ling, Hui Ji

Low-light images, i. e. the images captured in low-light conditions, suffer from very poor visibility caused by low contrast, color distortion and significant measurement noise.

Low-Light Image Enhancement

Neural Spatio-Temporal Beamformer for Target Speech Separation

1 code implementation8 May 2020 Yong Xu, Meng Yu, Shi-Xiong Zhang, Lian-Wu Chen, Chao Weng, Jianming Liu, Dong Yu

Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR).

Audio and Speech Processing Sound

Pathwise unique solutions and stochastic averaging for mixed stochastic partial differential equations driven by fractional Brownian motion and Brownian motion

no code implementations11 Apr 2020 Bin Pei, Yuzuru Inahama, Yong Xu

This paper is devoted to a system of stochastic partial differential equations (SPDEs) that have a slow component driven by fractional Brownian motion (fBm) with the Hurst parameter $H >1/2$ and a fast component driven by fast-varying diffusion.

Probability Dynamical Systems 60G22, 60H05, 60H15, 34C29

Multi-modal Multi-channel Target Speech Separation

no code implementations16 Mar 2020 Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lian-Wu Chen, Yuexian Zou, Dong Yu

Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers.

Speech Separation

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

no code implementations9 Mar 2020 Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.

Speech Separation

Self-supervised learning for audio-visual speaker diarization

no code implementations13 Feb 2020 Yifan Ding, Yong Xu, Shi-Xiong Zhang, Yahuan Cong, Liqiang Wang

Speaker diarization, which is to find the speech segments of specific speakers, has been widely used in human-centered applications such as video conferences or human-computer interaction systems.

Self-Supervised Learning speaker-diarization +2

Deep Learning on Image Denoising: An overview

no code implementations31 Dec 2019 Chunwei Tian, Lunke Fei, Wenxian Zheng, Yong Xu, WangMeng Zuo, Chia-Wen Lin

However, there are substantial differences in the various types of deep learning methods dealing with image denoising.

Image Denoising

A Unified Framework for Speech Separation

no code implementations17 Dec 2019 Fahimeh Bahmaninezhad, Shi-Xiong Zhang, Yong Xu, Meng Yu, John H. L. Hansen, Dong Yu

The initial solutions introduced for deep learning based speech separation analyzed the speech signals into time-frequency domain with STFT; and then encoded mixed signals were fed into a deep neural network based separator.

Speech Separation

Adaptive GNN for Image Analysis and Editing

no code implementations NeurIPS 2019 Lingyu Liang, Lianwen Jin, Yong Xu

In practical verification, we design a new regularization structure with guided feature to produce GNN-based filtering and propagation diffusion to tackle the ill-posed inverse problems of quotient image analysis (QIA), which recovers the reflectance ratio as a signature for image analysis or adjustment.

Low-Light Image Enhancement

Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network

no code implementations16 Sep 2019 Ke Tan, Yong Xu, Shi-Xiong Zhang, Meng Yu, Dong Yu

Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments.

Audio and Speech Processing Sound Signal Processing

Image denoising using deep CNN with batch renormalization

2 code implementations Neural Networks 2019 Chunwei Tian, Yong Xu, WangMeng Zuo

In this paper, we report the design of a novel network called a batch-renormalization denoising network (BRDNet).

Image Denoising

Dedge-AGMNet:an effective stereo matching network optimized by depth edge auxiliary task

no code implementations25 Aug 2019 Weida Yang, Xindong Ai, Zuliu Yang, Yong Xu, Yong Zhao

To improve the performance in ill-posed regions, this paper proposes an atrous granular multi-scale network based on depth edge subnetwork(Dedge-AGMNet).

3D Architecture Disparity Estimation +3

Coupled-Projection Residual Network for MRI Super-Resolution

no code implementations12 Jul 2019 Chun-Mei Feng, Kai Wang, Shijian Lu, Yong Xu, Heng Kong, Ling Shao

The deep sub-network learns from the residuals of the high-frequency image information, where multiple residual blocks are cascaded to magnify the MRI images at the last network layer.


Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

1 code implementation14 Jun 2019 Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available.

Generative Adversarial Network Image Inpainting

Robust Classification with Sparse Representation Fusion on Diverse Data Subsets

no code implementations10 Jun 2019 Chun-Mei Feng, Yong Xu, Zuoyong Li, Jian Yang

It performs Sparse Representation Fusion based on the Diverse Subset of training samples (SRFDS), which reduces the impact of randomness of the sample set and enhances the robustness of classification results.

General Classification Robust classification

A comprehensive study of speech separation: spectrogram vs waveform separation

no code implementations17 May 2019 Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu

We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information.

speech-recognition Speech Recognition +1

End-to-End Multi-Channel Speech Separation

no code implementations15 May 2019 Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.

Speech Separation

Time Domain Audio Visual Speech Separation

no code implementations7 Apr 2019 Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu

Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement.

Audio and Speech Processing Sound

Image Cartoon-Texture Decomposition Using Isotropic Patch Recurrence

no code implementations10 Nov 2018 Ruotao Xu, Yuhui Quan, Yong Xu

Aiming at separating the cartoon and texture layers from an image, cartoon-texture decomposition approaches resort to image priors to model cartoon and texture respectively.

Enhanced CNN for image denoising

no code implementations28 Oct 2018 Chunwei Tian, Yong Xu, Lunke Fei, Junqian Wang, Jie Wen, Nan Luo

Owing to flexible architectures of deep convolutional neural networks (CNNs), CNNs are successfully used for image denoising.

Image Denoising

Deep Learning for Image Denoising: A Survey

no code implementations11 Oct 2018 Chunwei Tian, Yong Xu, Lunke Fei, Ke Yan

Since the proposal of big data analysis and Graphic Processing Unit (GPU), the deep learning technology has received a great deal of attention and has been widely applied in the field of imaging processing.

BIG-bench Machine Learning Image Denoising

Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

2 code implementations12 Apr 2018 Qiuqiang Kong, Yong Xu, Iwona Sobieraj, Wenwu Wang, Mark D. Plumbley

Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip.

Sound Audio and Speech Processing

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

1 code implementation CVPR 2018 Jingwen Wang, Wenhao Jiang, Lin Ma, Wei Liu, Yong Xu

We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions.

Decoder Dense Video Captioning

A joint separation-classification model for sound event detection of weakly labelled data

2 code implementations8 Nov 2017 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events.

Sound Audio and Speech Processing

Audio Set classification with attention model: A probabilistic perspective

5 code implementations2 Nov 2017 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure.

Sound Audio and Speech Processing

Large-scale weakly supervised audio classification using gated convolutional neural network

3 code implementations1 Oct 2017 Yong Xu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge.

Sound Audio and Speech Processing

Discriminative Block-Diagonal Representation Learning for Image Recognition

no code implementations12 Jul 2017 Zheng Zhang, Yong Xu, Ling Shao, Jian Yang

In particular, the elaborate BDLRR is formulated as a joint optimization problem of shrinking the unfavorable representation from off-block-diagonal elements and strengthening the compact block-diagonal representation under the semi-supervised framework of low-rank representation.

Representation Learning

Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation

3 code implementations CVPR 2017 Hongliang Yan, Yukang Ding, Peihua Li, Qilong Wang, Yong Xu, WangMeng Zuo

Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable.

Unsupervised Domain Adaptation

Learning Inverse Mapping by Autoencoder based Generative Adversarial Nets

no code implementations29 Mar 2017 Junyu Luo, Yong Xu, Chenwei Tang, Jiancheng Lv

The inverse mapping of GANs'(Generative Adversarial Nets) generator has a great potential value. Hence, some works have been developed to construct the inverse function of generator by directly learning or adversarial learning. While the results are encouraging, the problem is highly challenging and the existing ways of training inverse models of GANs have many disadvantages, such as hard to train or poor performance. Due to these reasons, we propose a new approach based on using inverse generator ($IG$) model as encoder and pre-trained generator ($G$) as decoder of an AutoEncoder network to train the $IG$ model.


Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement

no code implementations21 Mar 2017 Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee

We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals.


Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging

1 code implementation17 Mar 2017 Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.


Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

2 code implementations24 Feb 2017 Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging.

Audio Tagging

A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data

1 code implementation6 Oct 2016 Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark Plumbley

The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user.


Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

2 code implementations13 Jul 2016 Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley

For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features.

Audio Tagging General Classification +1

Lecture bilingue augment\'ee par des alignements multi-niveaux (Augmenting bilingual reading with alignment information)

no code implementations JEPTALNRECITAL 2016 Fran{\c{c}}ois Yvon, Yong Xu, Marianna Apidianaki, Cl{\'e}ment Pillias, Cubaud Pierre

Le travail qui a conduit {\`a} cette d{\'e}monstration combine des outils de traitement des langues multilingues, en particulier l{'}alignement automatique, avec des techniques de visualisation et d{'}interaction.

Fully DNN-based Multi-label regression for audio tagging

no code implementations24 Jun 2016 Yong Xu, Qiang Huang, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

Compared with the conventional Gaussian Mixture Model (GMM) and support vector machine (SVM) methods, the proposed fully DNN-based method could well utilize the long-term temporal information with the whole chunk as the input.

Audio Tagging Event Detection +4

Natural Scene Character Recognition Using Robust PCA and Sparse Representation

no code implementations15 Jun 2016 Zheng Zhang, Yong Xu, Cheng-Lin Liu

Natural scene character recognition is challenging due to the cluttered background, which is hard to separate from text.

Sparse Coding for Classification via Discrimination Ensemble

no code implementations CVPR 2016 Yuhui Quan, Yong Xu, Yuping Sun, Yan Huang, Hui Ji

Discriminative sparse coding has emerged as a promising technique in image analysis and recognition, which couples the process of classifier training and the process of dictionary learning for improving the discriminability of sparse codes.

Classification Dictionary Learning +1

A survey of sparse representation: algorithms and applications

no code implementations23 Feb 2016 Zheng Zhang, Yong Xu, Jian Yang, Xuelong. Li, David Zhang

The main purpose of this article is to provide a comprehensive study and an updated review on sparse representation and to supply a guidance for researchers.

Removing Rain From a Single Image via Discriminative Sparse Coding

no code implementations ICCV 2015 Yu Luo, Yong Xu, Hui Ji

The paper aims at developing an effective algorithm to remove visual effects of rain from a single rainy image, i. e. separate the rain layer and the de-rained image layer from an rainy image.

Dictionary Learning Rain Removal

Lacunarity Analysis on Image Patterns for Texture Classification

no code implementations CVPR 2014 Yuhui Quan, Yong Xu, Yuping Sun, Yu Luo

Based on the concept of lacunarity in fractal geometry, we developed a statistical approach to texture description, which yields highly discriminative feature with strong robustness to a wide range of transformations, including photometric changes and geometric changes.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.