1 code implementation • 15 Mar 2023 • Lianghao Xia, Chao Huang, Jiao Shi, Yong Xu
Motivated by these limitations, we propose a simple and effective collaborative filtering model (SimRec) that marries the power of knowledge distillation and contrastive learning.
no code implementations • 15 Mar 2023 • Chengliang Liu, Jie Wen, Xiaoling Luo, Chao Huang, Zhihao Wu, Yong Xu
To deal with the double incomplete multi-view multi-label classification problem, we propose a deep instance-level contrastive network, namely DICNet.
no code implementations • 14 Mar 2023 • Lianghao Xia, Yizhen Shao, Chao Huang, Yong Xu, Huance Xu, Jian Pei
In this work, we design a Disentangled Graph Neural Network (DGNN) with the integration of latent memory units, which empowers DGNN to maintain factorized representations for heterogeneous types of user and item connections.
no code implementations • 13 Mar 2023 • Chengliang Liu, Jie Wen, Xiaoling Luo, Yong Xu
The former aggregates information from different views in the process of extracting view-specific features, and the latter learns subcategory embedding to improve classification performance.
1 code implementation • 2 Mar 2023 • Mengru Chen, Chao Huang, Lianghao Xia, Wei Wei, Yong Xu, Ronghua Luo
In light of this, we propose a Heterogeneous Graph Contrastive Learning (HGCL), which is able to incorporate heterogeneous relational semantics into the user-item interaction modeling with contrastive learning-enhanced knowledge transfer across different views.
no code implementations • 17 Feb 2023 • Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Liefeng Bo
Recent years have witnessed the emerging success of many deep learning-based recommendation models for augmenting collaborative filtering architectures with various neural network architectures, such as multi-layer perceptron and autoencoder.
1 code implementation • 19 Dec 2022 • Feng Lin, Wenze Hu, YaoWei Wang, Yonghong Tian, Guangming Lu, Fanglin Chen, Yong Xu, Xiaoyu Wang
Over the past few years, there has been growing interest in developing a broad, universal, and general-purpose computer vision system.
no code implementations • 1 Dec 2022 • Lintai Wu, Qijian Zhang, Junhui Hou, Yong Xu
The experimental results of our method are superior to those of the state-of-the-art unsupervised methods by a large margin.
no code implementations • 22 Nov 2022 • Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu
While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use.
1 code implementation • 14 Nov 2022 • Jiaxin Ye, Xin-Cheng Wen, Yujie Wei, Yong Xu, KunHong Liu, Hongming Shan
Specifically, TIM-Net first employs temporal-aware blocks to learn temporal affective representation, then integrates complementary information from the past and the future to enrich contextual representations, and finally, fuses multiple time scale features for better adaptation to the emotional variation.
Ranked #1 on
Speech Emotion Recognition
on EMOVO
2 code implementations • 28 Oct 2022 • Jia-Xin Ye, Xin-Cheng Wen, Xuan-Ze Wang, Yong Xu, Yan Luo, Chang-Li Wu, Li-Yan Chen, Kun-Hong Liu
In this paper, we propose a Gated Multi-scale Temporal Convolutional Network (GM-TCNet) to construct a novel emotional causality representation learning component with a multi-scale receptive field.
no code implementations • 28 Sep 2022 • Haoran Li, Chun-Mei Feng, Tao Zhou, Yong Xu, Xiaojun Chang
In this paper, we propose a prompt-driven efficient OSSL framework, called OpenPrompt, which can propagate class information from labeled to unlabeled data with only a small number of trainable parameters.
1 code implementation • 17 Aug 2022 • Jie Wen, Zheng Zhang, Lunke Fei, Bob Zhang, Yong Xu, Zhao Zhang, Jinxing Li
However, in practical applications, such as disease diagnosis, multimedia analysis, and recommendation system, it is common to observe that not all views of samples are available in many cases, which leads to the failure of the conventional multi-view clustering methods.
1 code implementation • 5 Aug 2022 • Chengliang Liu, Zhihao Wu, Jie Wen, Chao Huang, Yong Xu
Moreover, a novel local graph embedding term is introduced to learn the structured consensus representation.
no code implementations • 18 Jul 2022 • Xin-Cheng Wen, Jia-Xin Ye, Yan Luo, Yong Xu, Xuan-Ze Wang, Chang-Li Wu, Kun-Hong Liu
For the single-corpus task, the combination of Convolution-Pooling and Attention CapsNet module CPAC) is designed by embedding the self-attention mechanism to the CapsNet, guiding the module to focus on the important features that can be fed into different capsules.
1 code implementation • IEEE Transactions on Circuits and Systems for Video Technology 2022 • Jinxiu Liang, Yong Xu, Yuhui Quan, Boxin Shi, Hui Ji
The enhancement is done by jointly optimizing the Retinex decomposition and the illumination adjustment.
1 code implementation • 6 Jun 2022 • Lianghao Xia, Chao Huang, Yong Xu, Jian Pei
The new TGT method endows the sequential recommendation architecture to distill dedicated knowledge for type-specific behavior relational context and the implicit behavior dependencies.
no code implementations • 25 May 2022 • Yong Xu, Zhihua Xia, Zichi Wang, Xinpeng Zhang, Jian Weng
With a stego media discovered, the adversary could find out the sender or receiver and coerce them to disclose the secret message, which we name as coercive attack in this paper.
no code implementations • 20 May 2022 • Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu
Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.
1 code implementation • 26 Apr 2022 • Lianghao Xia, Chao Huang, Yong Xu, Jiashu Zhao, Dawei Yin, Jimmy Xiangji Huang
Additionally, our HCCF model effectively integrates the hypergraph structure encoding with self-supervised learning to reinforce the representation quality of recommender systems, based on the hypergraph-enhanced self-discrimination.
1 code implementation • 18 Apr 2022 • Zhonghang Li, Chao Huang, Lianghao Xia, Yong Xu, Jian Pei
Crime has become a major concern in many cities, which calls for the rising demand for timely predicting citywide crime occurrence.
no code implementations • 17 Apr 2022 • Zhijun Hu, Yong Xu, Jie Wen, Xianjing Cheng, Zaijun Zhang, Lilei Sun, YaoWei Wang
The proposed VABPP method is the first time that the view-aware-based method is used as a post-processing method in the field of vehicle re-identification.
1 code implementation • 31 Mar 2022 • Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu
In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting.
2 code implementations • CVPR 2022 • Xuhui Yang, YaoWei Wang, Ke Chen, Yong Xu, Yonghong Tian
Semantic patterns of fine-grained objects are determined by subtle appearance difference of local parts, which thus inspires a number of part-based methods.
1 code implementation • 17 Feb 2022 • Wei Wei, Chao Huang, Lianghao Xia, Yong Xu, Jiashu Zhao, Dawei Yin
In addition, to capture the diverse multi-behavior patterns, we design a contrastive meta network to encode the customized behavior heterogeneity for different users.
1 code implementation • 10 Jan 2022 • Lianghao Xia, Chao Huang, Yong Xu, Huance Xu, Xiang Li, WeiGuo Zhang
As the deep learning techniques have expanded to real-world recommendation tasks, many deep neural network based Collaborative Filtering (CF) models have been developed to project user-item interactions into latent feature space, based on various neural architectures, such as multi-layer perceptron, auto-encoder and graph neural networks.
1 code implementation • 7 Jan 2022 • Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Mengyin Lu, Liefeng Bo
Due to the overlook of user's multi-behavioral patterns over different items, existing recommendation methods are insufficient to capture heterogeneous collaborative signals from user multi-behavior data.
1 code implementation • IJCAI 2021 • Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Liefeng Bo, Xiyue Zhang, Tianyi Chen
Crime prediction is crucial for public safety and resource optimization, yet is very challenging due to two aspects: i) the dynamics of criminal patterns across time and space, crime events are distributed unevenly on both spatial and temporal domains; ii) time-evolving dependencies between different types of crimes (e. g., Theft, Robbery, Assault, Damage) which reveal fine-grained semantics of crimes.
no code implementations • CVPR 2022 • Tianyi Chen, Yunfei Zhang, Xiaoyang Huo, Si Wu, Yong Xu, Hau San Wong
To reduce the dependence of generative models on labeled data, we propose a semi-supervised hyper-spherical GAN for class-conditional fine-grained image generation, and our model is referred to as SphericGAN.
1 code implementation • 9 Dec 2021 • Chun-Mei Feng, Yunlu Yan, Shanshan Wang, Yong Xu, Ling Shao, Huazhu Fu
The core idea is to divide the MR reconstruction model into two parts: a globally shared encoder to obtain a generalized representation at the global level, and a client-specific decoder to preserve the domain-specific properties of each client, which is important for collaborative reconstruction when the clients have unique distribution.
1 code implementation • 2 Dec 2021 • Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, Haibin Ling
The potential of Transformer in representation learning remains under-explored.
Ranked #4 on
Visual Object Tracking
on TrackingNet
1 code implementation • NeurIPS 2021 • Yong Xu, Feng Li, Zhile Chen, Jinxiu Liang, Yuhui Quan
Existing convolutional neural networks (CNNs) often use global average pooling (GAP) to aggregate feature maps into a single representation.
no code implementations • 9 Nov 2021 • Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu
We train the proposed model in an end-to-end approach to eliminate background noise and echoes from far-end audio devices, which include nonlinear distortions.
2 code implementations • 15 Oct 2021 • Chun-Mei Feng, Huazhu Fu, Tianfei Zhou, Yong Xu, Ling Shao, David Zhang
Magnetic resonance (MR) imaging produces detailed images of organs and tissues with better contrast, but it suffers from a long acquisition time, which makes the image quality vulnerable to say motion artifacts.
1 code implementation • 8 Oct 2021 • Xiyue Zhang, Chao Huang, Yong Xu, Lianghao Xia, Peng Dai, Liefeng Bo, Junbo Zhang, Yu Zheng
Accurate forecasting of citywide traffic flow has been playing critical role in a variety of spatial-temporal mining applications, such as intelligent traffic control and public risk assessment.
1 code implementation • 8 Oct 2021 • Xiaoling Long, Chao Huang, Yong Xu, Huance Xu, Peng Dai, Lianghao Xia, Liefeng Bo
To model relation heterogeneity, we design a metapath-guided heterogeneous graph neural network to aggregate feature embeddings from different types of meta-relations across users and items, em-powering SMIN to maintain dedicated representations for multi-faceted user- and item-wise dependencies.
1 code implementation • 8 Oct 2021 • Lianghao Xia, Yong Xu, Chao Huang, Peng Dai, Liefeng Bo
Modern recommender systems often embed users and items into low-dimensional latent representations, based on their observed interactions.
1 code implementation • 8 Oct 2021 • Chao Huang, Huance Xu, Yong Xu, Peng Dai, Lianghao Xia, Mengyin Lu, Liefeng Bo, Hao Xing, Xiaoping Lai, Yanfang Ye
While many recent efforts show the effectiveness of neural network-based social recommender systems, several important challenges have not been well addressed yet: (i) The majority of models only consider users' social connections, while ignoring the inter-dependent knowledge across items; (ii) Most of existing solutions are designed for singular type of user-item interactions, making them infeasible to capture the interaction heterogeneity; (iii) The dynamic nature of user-item interactions has been less explored in many social-aware recommendation techniques.
1 code implementation • 8 Oct 2021 • Chao Huang, Jiahui Chen, Lianghao Xia, Yong Xu, Peng Dai, Yanqing Chen, Liefeng Bo, Jiashu Zhao, Jimmy Xiangji Huang
The learning process of intra- and inter-session transition dynamics are integrated, to preserve the underlying low- and high-level item relationships in a common latent space.
1 code implementation • 8 Oct 2021 • Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Xiyue Zhang, Hongsheng Yang, Jian Pei, Liefeng Bo
In particular: i) complex inter-dependencies across different types of user behaviors; ii) the incorporation of knowledge-aware item relations into the multi-behavior recommendation framework; iii) dynamic characteristics of multi-typed user-item interactions.
1 code implementation • 8 Oct 2021 • Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Bo Zhang, Liefeng Bo
The overlook of multiplex behavior relations can hardly recognize the multi-modal contextual signals across different types of interactions, which limit the feasibility of current recommendation methods.
1 code implementation • 8 Oct 2021 • Huance Xu, Chao Huang, Yong Xu, Lianghao Xia, Hao Xing, Dawei Yin
Social recommendation which aims to leverage social connections among users to enhance the recommendation performance.
1 code implementation • 3 Sep 2021 • Chun-Mei Feng, Yunlu Yan, Kai Yu, Yong Xu, Ling Shao, Huazhu Fu
Our SANet could explore the areas of high-intensity and low-intensity regions in the "forward" and "reverse" directions with the help of the auxiliary contrast, while learning clearer anatomical structure and edge information for the SR of a target-contrast MR image.
no code implementations • 2 Sep 2021 • Zun Wang, Chong Wang, Sibo Zhao, Yong Xu, Shaogang Hao, Chang Yu Hsieh, Bing-Lin Gu, Wenhui Duan
With many frameworks based on message passing neural networks proposed to predict molecular and bulk properties, machine learning methods have tremendously shifted the paradigms of computational sciences underpinning physics, material science, chemistry, and biology.
no code implementations • 25 Aug 2021 • Cong Wang, Yan Huang, Yuexian Zou, Yong Xu
However, it is noted that ASM-based SIDM degrades its performance in dehazing real world hazy images due to the limited modelling ability of ASM where the atmospheric light factor (ALF) and the angular scattering coefficient (ASC) are assumed as constants for one image.
1 code implementation • 27 Jun 2021 • Chun-Mei Feng, Yunlu Yan, Geng Chen, Yong Xu, Ling Shao, Huazhu Fu
To this end, we propose a multi-modal transformer (MTrans), which is capable of transferring multi-scale features from the target modality to the auxiliary modality, for accelerated MR imaging.
1 code implementation • 26 Jun 2021 • Huafeng Li, Kaixiong Xu, Jinxing Li, Guangming Lu, Yong Xu, Zhengtao Yu, David Zhang
Since human-labeled samples are free for the target set, unsupervised person re-identification (Re-ID) has attracted much attention in recent years, by additionally exploiting the source set.
no code implementations • CVPR 2021 • Zhile Chen, Feng Li, Yuhui Quan, Yong Xu, Hui Ji
In recent years, convolutional neural networks (CNNs) have become a prominent tool for texture recognition.
1 code implementation • 12 Jun 2021 • Chun-Mei Feng, Yunlu Yan, Huazhu Fu, Li Chen, Yong Xu
Then, a task transformer module is designed to embed and synthesize the relevance between the two tasks.
Ranked #9 on
Image Super-Resolution
on IXI
1 code implementation • 19 May 2021 • Chun-Mei Feng, Huazhu Fu, Shuhao Yuan, Yong Xu
In this work, we propose a multi-stage integration network (i. e., MINet) for multi-contrast MRI SR, which explicitly models the dependencies between multi-contrast images at different stages to guide image SR.
no code implementations • 12 May 2021 • Chun-Mei Feng, Zhanyuan Yang, Huazhu Fu, Yong Xu, Jian Yang, Ling Shao
In this paper, we propose the Dual-Octave Network (DONet), which is capable of learning multi-scale spatial-frequency features from both the real and imaginary components of MR data, for fast parallel MR image reconstruction.
no code implementations • 17 Apr 2021 • Xiyun Li, Yong Xu, Meng Yu, Shi-Xiong Zhang, Jiaming Xu, Bo Xu, Dong Yu
The spatial self-attention module is designed to attend on the cross-channel correlation in the covariance matrices.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 12 Apr 2021 • Chun-Mei Feng, Zhanyuan Yang, Geng Chen, Yong Xu, Ling Shao
We evaluate the performance of the proposed model on the acceleration of multi-coil MR image reconstruction.
no code implementations • 2 Apr 2021 • Meng Yu, Chunlei Zhang, Yong Xu, ShiXiong Zhang, Dong Yu
The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests.
no code implementations • 31 Mar 2021 • Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu
In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.
1 code implementation • 25 Mar 2021 • Chunwei Tian, Yong Xu, WangMeng Zuo, Chia-Wen Lin, David Zhang
In this paper, we propose an asymmetric CNN (ACNet) comprising an asymmetric block (AB), a memory enhancement block (MEB) and a high-frequency feature enhancement block (HFFEB) for image super-resolution.
no code implementations • 17 Feb 2021 • Damián Marelli, Yong Xu, Minyue Fu, Zenghong Huang
As the second step towards our goal we complement the proposed method with a fully distributed method for estimating the optimal step size that maximizes convergence speed.
Distributed Optimization
Optimization and Control
1 code implementation • 25 Jan 2021 • Jing Xu, Tszhang Guo, Yong Xu, Zenglin Xu, Kun Bai
Deep Convolutional Neural Networks (DCNNs) and their variants have been widely used in large scale face recognition(FR) recently.
no code implementations • 21 Jan 2021 • Zhenyi Zheng, Yue Zhang, Victor Lopez-Dominguez, Luis Sánchez-Tejerina, Jiacheng Shi, Xueqiang Feng, Lei Chen, Zilu Wang, Zhizhong Zhang, Kun Zhang, Bin Hong, Yong Xu, Youguang Zhang, Mario Carpentieri, Albert Fert, Giovanni Finocchio, Weisheng Zhao, Pedram Khalili Amiri
Existing methods to do so involve the application of an in-plane bias magnetic field, or incorporation of in-plane structural asymmetry in the device, both of which can be difficult to implement in practical applications.
Mesoscale and Nanoscale Physics
no code implementations • 21 Jan 2021 • Cong Wang, Yan Huang, Yuexian Zou, Yong Xu
However, for images taken in real-world, the illumination is not uniformly distributed over whole image which brings model mismatch and possibly results in color shift of the deep models using ASM.
no code implementations • 8 Jan 2021 • Zun Wang, Chong Wang, Sibo Zhao, Shiqiao Du, Yong Xu, Bing-Lin Gu, Wenhui Duan
Molecular dynamics is a powerful simulation tool to explore material properties.
1 code implementation • ICCV 2021 • Xiaowei Liao, Yong Xu, Haibin Ling
Specifically, given two hypergraphs to be matched, we first construct an association hypergraph over them and convert the hypergraph matching problem into a node classification problem on the association hypergraph.
Ranked #7 on
Graph Matching
on Willow Object Class
no code implementations • ICCV 2021 • Tianyi Chen, Yi Liu, Yunfei Zhang, Si Wu, Yong Xu, Feng Liangbing, Hau San Wong
To ensure disentanglement among the variables, we maximize mutual information between the class-independent variable and synthesized images, map real images to the latent space of a generator to perform consistency regularization of cross-class attributes, and incorporate class semantic-based regularization into a discriminator's feature space.
no code implementations • 31 Dec 2020 • Haoran Ji, Yanzhao Liu, He Wang, Jiawei Luo, Jiaheng Li, Hao Li, Yang Wu, Yong Xu, Jian Wang
An essential ingredient to realize these quantum states is the magnetic gap in the topological surface states induced by the out-of-plane ferromagnetism on the surface of MnBi2Te4.
Materials Science
no code implementations • 24 Dec 2020 • Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Donald S. Williamson, Dong Yu
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 23 Dec 2020 • Zhijun Hu, Yong Xu, Jie Wen, Lilei Sun, Raja S P
Moreover, by designing a Euclidean distance threshold between all center pairs, which not only strengthens the inter-class separability of center loss, but also makes the center loss (or DDCL) works well without the combination of softmax loss.
no code implementations • 22 Dec 2020 • Jiong-Hao Wang, Yan-Bin Yang, Ning Dai, Yong Xu
Here we predict the existence of a secondorder topological insulating phase in an amorphous system without any crystalline symmetry.
Mesoscale and Nanoscale Physics Disordered Systems and Neural Networks
no code implementations • 30 Oct 2020 • Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu
The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 8 Sep 2020 • Heng Fan, Hexin Bai, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Harshit, Mingzhen Huang, Juehuan Liu, Yong Xu, Chunyuan Liao, Lin Yuan, Haibin Ling
The average video length of LaSOT is around 2, 500 frames, where each video contains various challenge factors that exist in real world video footage, such as the targets disappearing and re-appearing.
1 code implementation • 21 Aug 2020 • Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen
Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources.
1 code implementation • 16 Aug 2020 • Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Dong Yu
Speech separation algorithms are often used to separate the target speech from other interfering sources.
1 code implementation • 21 Jul 2020 • Jinxiu Liang, Jingwen Wang, Yuhui Quan, Tianyi Chen, Jiaying Liu, Haibin Ling, Yong Xu
REG produces progressively and efficiently intermediate images corresponding to various exposure settings, and such pseudo-exposures are then fused by MED to detect faces across different lighting conditions.
1 code implementation • 8 Jul 2020 • Chunwei Tian, Yong Xu, WangMeng Zuo, Bo Du, Chia-Wen Lin, David Zhang
The enhancement block gathers and fuses the global and local features to provide complementary information for the latter network.
1 code implementation • 8 Jul 2020 • Chunwei Tian, Ruibin Zhuge, Zhihao Wu, Yong Xu, WangMeng Zuo, Chen Chen, Chia-Wen Lin
Finally, the IRB uses coarse high-frequency features from the RB to learn more accurate SR features and construct a SR image.
no code implementations • 4 Jul 2020 • Jinxiu Liang, Yong Xu, Yuhui Quan, Jingwen Wang, Haibin Ling, Hui Ji
Low-light images, i. e. the images captured in low-light conditions, suffer from very poor visibility caused by low contrast, color distortion and significant measurement noise.
1 code implementation • 8 May 2020 • Yong Xu, Meng Yu, Shi-Xiong Zhang, Lian-Wu Chen, Chao Weng, Jianming Liu, Dong Yu
Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR).
Audio and Speech Processing Sound
no code implementations • 11 Apr 2020 • Bin Pei, Yuzuru Inahama, Yong Xu
This paper is devoted to a system of stochastic partial differential equations (SPDEs) that have a slow component driven by fractional Brownian motion (fBm) with the Hurst parameter $H >1/2$ and a fast component driven by fast-varying diffusion.
Probability Dynamical Systems 60G22, 60H05, 60H15, 34C29
no code implementations • 16 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lian-Wu Chen, Yuexian Zou, Dong Yu
Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers.
no code implementations • 9 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.
no code implementations • 13 Feb 2020 • Yifan Ding, Yong Xu, Shi-Xiong Zhang, Yahuan Cong, Liqiang Wang
Speaker diarization, which is to find the speech segments of specific speakers, has been widely used in human-centered applications such as video conferences or human-computer interaction systems.
no code implementations • 31 Dec 2019 • Chunwei Tian, Lunke Fei, Wenxian Zheng, Yong Xu, WangMeng Zuo, Chia-Wen Lin
However, there are substantial differences in the various types of deep learning methods dealing with image denoising.
no code implementations • 17 Dec 2019 • Fahimeh Bahmaninezhad, Shi-Xiong Zhang, Yong Xu, Meng Yu, John H. L. Hansen, Dong Yu
The initial solutions introduced for deep learning based speech separation analyzed the speech signals into time-frequency domain with STFT; and then encoded mixed signals were fed into a deep neural network based separator.
no code implementations • NeurIPS 2019 • Lingyu Liang, Lianwen Jin, Yong Xu
In practical verification, we design a new regularization structure with guided feature to produce GNN-based filtering and propagation diffusion to tackle the ill-posed inverse problems of quotient image analysis (QIA), which recovers the reflectance ratio as a signature for image analysis or adjustment.
no code implementations • 16 Sep 2019 • Ke Tan, Yong Xu, Shi-Xiong Zhang, Meng Yu, Dong Yu
Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments.
Audio and Speech Processing Sound Signal Processing
1 code implementation • Neural Networks 2019 • Chunwei Tian, Yong Xu, WangMeng Zuo
In this paper, we report the design of a novel network called a batch-renormalization denoising network (BRDNet).
no code implementations • 25 Aug 2019 • Weida Yang, Xindong Ai, Zuliu Yang, Yong Xu, Yong Zhao
To improve the performance in ill-posed regions, this paper proposes an atrous granular multi-scale network based on depth edge subnetwork(Dedge-AGMNet).
no code implementations • 12 Jul 2019 • Chun-Mei Feng, Kai Wang, Shijian Lu, Yong Xu, Heng Kong, Ling Shao
The deep sub-network learns from the residuals of the high-frequency image information, where multiple residual blocks are cascaded to magnify the MRI images at the last network layer.
1 code implementation • 14 Jun 2019 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley
Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available.
no code implementations • 10 Jun 2019 • Chun-Mei Feng, Yong Xu, Zuoyong Li, Jian Yang
It performs Sparse Representation Fusion based on the Diverse Subset of training samples (SRFDS), which reduces the impact of randomness of the sample set and enhances the robustness of classification results.
no code implementations • 28 May 2019 • Chun-Mei Feng, Yong Xu, Jin-Xing Liu, Ying-Lian Gao, Chun-Hou Zheng
To overcome this problem, this study developed a new PCA method, which is named the Supervised Discriminative Sparse PCA (SDSPCA).
no code implementations • 17 May 2019 • Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu
We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information.
no code implementations • 15 May 2019 • Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.
no code implementations • 7 Apr 2019 • Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu
Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement.
Audio and Speech Processing Sound
no code implementations • 10 Nov 2018 • Ruotao Xu, Yuhui Quan, Yong Xu
Aiming at separating the cartoon and texture layers from an image, cartoon-texture decomposition approaches resort to image priors to model cartoon and texture respectively.
no code implementations • 28 Oct 2018 • Chunwei Tian, Yong Xu, Lunke Fei, Junqian Wang, Jie Wen, Nan Luo
Owing to flexible architectures of deep convolutional neural networks (CNNs), CNNs are successfully used for image denoising.
no code implementations • 11 Oct 2018 • Chunwei Tian, Yong Xu, Lunke Fei, Ke Yan
Since the proposal of big data analysis and Graphic Processing Unit (GPU), the deep learning technology has received a great deal of attention and has been widely applied in the field of imaging processing.
1 code implementation • CVPR 2019 • Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, Haibin Ling
In this paper, we present LaSOT, a high-quality benchmark for Large-scale Single Object Tracking.
no code implementations • ECCV 2018 • Zheng Zhang, Li Liu, Jie Qin, Fan Zhu, Fumin Shen, Yong Xu, Ling Shao, Heng Tao Shen
How to economically cluster large-scale multi-view images is a long-standing problem in computer vision.
no code implementations • 17 Sep 2018 • Jie Wen, Zheng Zhang, Yong Xu, Zuofeng Zhong
Clustering with incomplete views is a challenge in multi-view clustering.
2 code implementations • 12 Apr 2018 • Qiuqiang Kong, Yong Xu, Iwona Sobieraj, Wenwu Wang, Mark D. Plumbley
Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip.
Sound Audio and Speech Processing
1 code implementation • CVPR 2018 • Jingwen Wang, Wenhao Jiang, Lin Ma, Wei Liu, Yong Xu
We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions.
2 code implementations • 8 Nov 2017 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley
First, we propose a separation mapping from the time-frequency (T-F) representation of an audio to the T-F segmentation masks of the audio events.
Sound Audio and Speech Processing
5 code implementations • 2 Nov 2017 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley
Then the classification of a bag is the expectation of the classification output of the instances in the bag with respect to the learned probability measure.
Sound Audio and Speech Processing
3 code implementations • 1 Oct 2017 • Yong Xu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley
In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly supervised sound event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 challenge.
Sound Audio and Speech Processing
no code implementations • 12 Jul 2017 • Zheng Zhang, Yong Xu, Ling Shao, Jian Yang
In particular, the elaborate BDLRR is formulated as a joint optimization problem of shrinking the unfavorable representation from off-block-diagonal elements and strengthening the compact block-diagonal representation under the semi-supervised framework of low-rank representation.
3 code implementations • CVPR 2017 • Hongliang Yan, Yukang Ding, Peihua Li, Qilong Wang, Yong Xu, WangMeng Zuo
Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable.
no code implementations • 29 Mar 2017 • Junyu Luo, Yong Xu, Chenwei Tang, Jiancheng Lv
The inverse mapping of GANs'(Generative Adversarial Nets) generator has a great potential value. Hence, some works have been developed to construct the inverse function of generator by directly learning or adversarial learning. While the results are encouraging, the problem is highly challenging and the existing ways of training inverse models of GANs have many disadvantages, such as hard to train or poor performance. Due to these reasons, we propose a new approach based on using inverse generator ($IG$) model as encoder and pre-trained generator ($G$) as decoder of an AutoEncoder network to train the $IG$ model.
no code implementations • 21 Mar 2017 • Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee
We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals.
Sound
1 code implementation • 17 Mar 2017 • Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge.
Sound
2 code implementations • 24 Feb 2017 • Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging.
1 code implementation • 6 Oct 2016 • Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark Plumbley
The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user.
Sound
2 code implementations • 13 Jul 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley
For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features.
no code implementations • 13 Jul 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Mark D. Plumbley
In this paper, we present a deep neural network (DNN)-based acoustic scene classification framework.
no code implementations • JEPTALNRECITAL 2016 • Fran{\c{c}}ois Yvon, Yong Xu, Marianna Apidianaki, Cl{\'e}ment Pillias, Cubaud Pierre
Le travail qui a conduit {\`a} cette d{\'e}monstration combine des outils de traitement des langues multilingues, en particulier l{'}alignement automatique, avec des techniques de visualisation et d{'}interaction.
no code implementations • 24 Jun 2016 • Yong Xu, Qiang Huang, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley
Compared with the conventional Gaussian Mixture Model (GMM) and support vector machine (SVM) methods, the proposed fully DNN-based method could well utilize the long-term temporal information with the whole chunk as the input.
no code implementations • 15 Jun 2016 • Zheng Zhang, Yong Xu, Cheng-Lin Liu
Natural scene character recognition is challenging due to the cluttered background, which is hard to separate from text.
no code implementations • CVPR 2016 • Yuhui Quan, Yong Xu, Yuping Sun, Yan Huang, Hui Ji
Discriminative sparse coding has emerged as a promising technique in image analysis and recognition, which couples the process of classifier training and the process of dictionary learning for improving the discriminability of sparse codes.
no code implementations • LREC 2016 • Yong Xu, Fran{\c{c}}ois Yvon
Resources for evaluating sentence-level and word-level alignment algorithms are unsatisfactory.
no code implementations • 23 Feb 2016 • Zheng Zhang, Yong Xu, Jian Yang, Xuelong. Li, David Zhang
The main purpose of this article is to provide a comprehensive study and an updated review on sparse representation and to supply a guidance for researchers.
no code implementations • ICCV 2015 • Yu Luo, Yong Xu, Hui Ji
The paper aims at developing an effective algorithm to remove visual effects of rain from a single rainy image, i. e. separate the rain layer and the de-rained image layer from an rainy image.
no code implementations • CVPR 2014 • Yuhui Quan, Yong Xu, Yuping Sun, Yu Luo
Based on the concept of lacunarity in fractal geometry, we developed a statistical approach to texture description, which yields highly discriminative feature with strong robustness to a wide range of transformations, including photometric changes and geometric changes.