Search Results for author: Yuexian Zou

Found 72 papers, 18 papers with code

A Transformer-based Threshold-Free Framework for Multi-Intent NLU

no code implementations COLING 2022 Lisung Chen, Nuo Chen, Yuexian Zou, Yong Wang, Xinzhong Sun

Furthermore, we propose a threshold-free intent multi-intent classifier that utilizes the output of IND task and detects the multiple intents without depending on the threshold.

Multi-Task Learning Natural Language Understanding

PoseRAC: Pose Saliency Transformer for Repetitive Action Counting

1 code implementation15 Mar 2023 Ziyu Yao, Xuxin Cheng, Yuexian Zou

Moreover, we introduce a pose-level method, PoseRAC, which is based on this representation and achieves state-of-the-art performance on two new version datasets by using Pose Saliency Annotation to annotate salient poses for training.

Improve Retrieval-based Dialogue System via Syntax-Informed Attention

no code implementations12 Mar 2023 Tengtao Song, Nuo Chen, Ji Jiang, Zhihong Zhu, Yuexian Zou

Since incorporating syntactic information like dependency structures into neural models can promote a better understanding of the sentences, such a method has been widely used in NLP tasks.

Retrieval

ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

1 code implementation11 Mar 2023 Bang Yang, Fenglin Liu, Yuexian Zou, Xian Wu, YaoWei Wang, David A. Clifton

We present the results of extensive experiments on twelve NLG tasks, showing that, without using any labeled downstream pairs for training, ZeroNLG generates high-quality and believable outputs and significantly outperforms existing zero-shot methods.

Image Captioning Machine Translation +5

Improving Weakly Supervised Sound Event Detection with Causal Intervention

no code implementations10 Mar 2023 Yifei Xin, Dongchao Yang, Fan Cui, Yujun Wang, Yuexian Zou

Existing weakly supervised sound event detection (WSSED) work has not explored both types of co-occurrences simultaneously, i. e., some sound events often co-occur, and their occurrences are usually accompanied by specific background sounds, so they would be inevitably entangled, causing misclassification and biased localization results with only clip-level supervision.

Event Detection Sound Event Detection

FTM: A Frame-level Timeline Modeling Method for Temporal Graph Representation Learning

1 code implementation23 Feb 2023 Bowen Cao, Qichen Ye, Weiyuan Xu, Yuexian Zou

Existing neighborhood aggregation strategies fail to capture either the short-term features or the long-term features of temporal graph attributes, leading to unsatisfactory model performance and even poor robustness and domain generality of the representation learning method.

Graph Representation Learning

FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering

1 code implementation23 Feb 2023 Qichen Ye, Bowen Cao, Nuo Chen, Weiyuan Xu, Yuexian Zou

Despite the promising result of recent KAQA systems which tend to integrate linguistic knowledge from pre-trained language models (PLM) and factual knowledge from knowledge graphs (KG) to answer complex questions, a bottleneck exists in effectively fusing the representations from PLMs and KGs because of (i) the semantic and distributional gaps between them, and (ii) the difficulties in joint reasoning over the provided knowledge from both modalities.

Knowledge Graphs Question Answering

Generating Templated Caption for Video Grounding

no code implementations15 Jan 2023 Hongxiang Li, Meng Cao, Xuxin Cheng, Zhihong Zhu, Yaowei Li, Yuexian Zou

Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.

Contrastive Learning Dense Video Captioning +1

Aligning Source Visual and Target Language Domains for Unpaired Video Captioning

no code implementations22 Nov 2022 Fenglin Liu, Xian Wu, Chenyu You, Shen Ge, Yuexian Zou, Xu sun

To this end, we introduce the unpaired video captioning task aiming to train models without coupled video-caption pairs in target language.

Translation Video Captioning

A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding

1 code implementation8 Nov 2022 Zhihong Zhu, Weiyuan Xu, Xuxin Cheng, Tengtao Song, Yuexian Zou

Multi-intent detection and slot filling joint models are gaining increasing traction since they are closer to complicated real-world scenarios.

Intent Detection slot-filling +2

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

no code implementations28 Oct 2022 Fenglin Liu, Xian Wu, Shen Ge, Xuancheng Ren, Wei Fan, Xu sun, Yuexian Zou

To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format.

Image Captioning Language Modelling +2

Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning

no code implementations19 Oct 2022 Fenglin Liu, Xuewei Ma, Xuancheng Ren, Xian Wu, Wei Fan, Yuexian Zou, Xu sun

Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words.

Image Captioning

Video Referring Expression Comprehension via Transformer with Content-aware Query

1 code implementation6 Oct 2022 Ji Jiang, Meng Cao, Tengtao Song, Yuexian Zou

To this end, we introduce two new datasets (i. e., VID-Entity and VidSTG-Entity) by augmenting the VIDSentence and VidSTG datasets with the explicitly referred words in the whole sentence, respectively.

Referring Expression Referring Expression Comprehension

LocVTP: Video-Text Pre-training for Temporal Localization

1 code implementation21 Jul 2022 Meng Cao, Tianyu Yang, Junwu Weng, Can Zhang, Jue Wang, Yuexian Zou

To further enhance the temporal reasoning ability of the learned feature, we propose a context projection head and a temporal aware contrastive loss to perceive the contextual relationships.

Retrieval Temporal Localization +1

Correspondence Matters for Video Referring Expression Comprehension

1 code implementation21 Jul 2022 Meng Cao, Ji Jiang, Long Chen, Yuexian Zou

Extensive experiments demonstrate that our DCNet achieves state-of-the-art performance on both video and image REC benchmarks.

Contrastive Learning Referring Expression +2

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

1 code implementation20 Jul 2022 Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu

In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.

Audio Generation

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

1 code implementation5 Jun 2022 Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu

Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

no code implementations3 May 2022 Xinmeng Xu, Rongzhi Gu, Yuexian Zou

Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems.

Multi-Task Learning Speech Enhancement

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

no code implementations Findings (NAACL) 2022 Chenyu You, Nuo Chen, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou

To evaluate the capacity of SCQA systems in a dialogue-style interaction, we assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with more than 40k question-answer pairs from 4k conversations.

Conversational Question Answering Spoken Language Understanding +1

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

no code implementations15 Apr 2022 Zifeng Zhao, Rongzhi Gu, Dongchao Yang, Jinchuan Tian, Yuexian Zou

Dominant researches adopt supervised training for speaker extraction, while the scarcity of ideally clean corpus and channel mismatch problem are rarely considered.

Domain Adaptation

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

no code implementations4 Apr 2022 Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou

However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture, due to the auxiliary speaker encoder may sometimes generate ambiguous speaker embeddings.

Metric Learning Speaker Separation +1

Learning Decoupling Features Through Orthogonality Regularization

no code implementations31 Mar 2022 Li Wang, Rongzhi Gu, Weiji Zhuang, Peng Gao, Yujun Wang, Yuexian Zou

Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively.

Keyword Spotting Speaker Verification

SpatioTemporal Focus for Skeleton-based Action Recognition

no code implementations31 Mar 2022 Liyu Wu, Can Zhang, Yuexian Zou

Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information from the body joints and parts.

Action Recognition Skeleton Based Action Recognition

Integrating Lattice-Free MMI into End-to-End Speech Recognition

1 code implementation29 Mar 2022 Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu

However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Unsupervised Pre-training for Temporal Action Localization Tasks

1 code implementation CVPR 2022 Can Zhang, Tianyu Yang, Junwu Weng, Meng Cao, Jue Wang, Yuexian Zou

These pre-trained models can be sub-optimal for temporal localization tasks due to the inherent discrepancy between video-level classification and clip-level localization.

Contrastive Learning Representation Learning +4

CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter

1 code implementation30 Nov 2021 Bang Yang, Tong Zhang, Yuexian Zou

DCD is an auxiliary task that requires a caption model to learn the correspondence between video content and concepts and the co-occurrence relations between concepts.

Representation Learning Video Captioning

On Pursuit of Designing Multi-modal Transformer for Video Grounding

no code implementations EMNLP 2021 Meng Cao, Long Chen, Mike Zheng Shou, Can Zhang, Yuexian Zou

Almost all existing video grounding methods fall into two frameworks: 1) Top-down model: It predefines a set of segment candidates and then conducts segment classification and regression.

Video Grounding

Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering

no code implementations Findings (EMNLP) 2021 Chenyu You, Nuo Chen, Yuexian Zou

In this paper, we propose novel training schemes for spoken question answering with a self-supervised training stage and a contrastive representation learning stage.

Question Answering Representation Learning

HAN: Higher-order Attention Network for Spoken Language Understanding

no code implementations26 Aug 2021 Dongsheng Chen, Zhiqi Huang, Yuexian Zou

Spoken Language Understanding (SLU), including intent detection and slot filling, is a core component in human-computer interaction.

Intent Detection slot-filling +2

Fully Non-Homogeneous Atmospheric Scattering Modeling with Convolutional Neural Networks for Single Image Dehazing

no code implementations25 Aug 2021 Cong Wang, Yan Huang, Yuexian Zou, Yong Xu

However, it is noted that ASM-based SIDM degrades its performance in dehazing real world hazy images due to the limited modelling ability of ASM where the atmospheric light factor (ALF) and the angular scattering coefficient (ASC) are assumed as constants for one image.

Image Dehazing Single Image Dehazing

Joint Multiple Intent Detection and Slot Filling via Self-distillation

no code implementations18 Aug 2021 Lisong Chen, Peilin Zhou, Yuexian Zou

With the auxiliary knowledge provided by the MIL Intent Decoder, we set Final Slot Decoder as the teacher model that imparts knowledge back to Initial Slot Decoder to complete the loop.

Intent Detection Multiple Instance Learning +3

Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

no code implementations12 Aug 2021 Li Wang, Rongzhi Gu, Nuo Chen, Yuexian Zou

Recently proposed metric learning approaches improved the generalizability of models for the KWS task, and 1D-CNN based KWS models have achieved the state-of-the-arts (SOTA) in terms of model size.

Metric Learning Small-Footprint Keyword Spotting

Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model

no code implementations4 Jul 2021 Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

As a result, the proposed approach can handle various tasks including: Audio-Oriented Multimodal Machine Comprehension, Machine Reading Comprehension and Machine Listening Comprehension, in a single model, making fair comparisons possible between our model and the existing unimodal MC models.

Knowledge Distillation Machine Reading Comprehension

Long-Short Temporal Modeling for Efficient Action Recognition

no code implementations30 Jun 2021 Liyu Wu, Yuexian Zou, Can Zhang

Efficient long-short temporal modeling is key for enhancing the performance of action recognition task.

Action Recognition

SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal Action Detection

no code implementations29 Jun 2021 Ranyu Ning, Can Zhang, Yuexian Zou

Current mainstream one-stage TAD approaches localize and classify action proposals relying on pre-defined anchors, where the location and scale for action instances are set by designers.

Action Detection

All You Need is a Second Look: Towards Arbitrary-Shaped Text Detection

no code implementations24 Jun 2021 Meng Cao, Can Zhang, Dongming Yang, Yuexian Zou

Compared to the traditional single-stage segmentation network, our NASK conducts the detection in a coarse-to-fine manner with the first stage segmentation spotting the rectangle text proposals and the second one retrieving compact representations.

Instance Segmentation Semantic Segmentation

Exploring Semantic Relationships for Unpaired Image Captioning

no code implementations20 Jun 2021 Fenglin Liu, Meng Gao, Tianhao Zhang, Yuexian Zou

To further improve the quality of captions generated by the model, we propose the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image.

Image Captioning

Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation

no code implementations CVPR 2021 Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou

In detail, PoKE explores the posterior knowledge, which provides explicit abnormal visual regions to alleviate visual data bias; PrKE explores the prior knowledge from the prior medical knowledge graph (medical knowledge) and prior radiology reports (working experience) to alleviate textual data bias.

Contrastive Attention for Automatic Chest X-ray Report Generation

no code implementations Findings (ACL) 2021 Xuewei Ma, Fenglin Liu, Changchang Yin, Xian Wu, Shen Ge, Yuexian Zou, Ping Zhang, Xu sun

In addition, according to the analysis, the CA model can help existing models better attend to the abnormal regions and provide more accurate descriptions which are crucial for an interpretable diagnosis.

Rethinking Skip Connection with Layer Normalization in Transformers and ResNets

no code implementations15 May 2021 Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang, Xu sun, Yuexian Zou

In this work, we investigate how the scale factors in the effectiveness of the skip connection and reveal that a trivial adjustment of the scale will lead to spurious gradient exploding or vanishing in line with the deepness of the models, which could be addressed by normalization, in particular, layer normalization, which induces consistent improvements over the plain skip connection.

Image Classification Machine Translation +1

RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection

no code implementations30 Apr 2021 Dongming Yang, Yuexian Zou, Can Zhang, Meng Cao, Jie Chen

Upon the frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: a) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions.

Human-Object Interaction Detection

Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency

no code implementations8 Apr 2021 Jinchuan Tian, Rongzhi Gu, Helin Wang, Yuexian Zou

Transformer-based self-supervised models are trained as feature extractors and have empowered many downstream speech tasks to achieve state-of-the-art performance.

speech-recognition Speech Recognition

SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification

no code implementations31 Mar 2021 Helin Wang, Yuexian Zou, Wenwu Wang

In this paper, we present SpecAugment++, a novel data augmentation method for deep neural networks based acoustic scene classification (ASC).

Acoustic Scene Classification Data Augmentation +2

CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning

1 code implementation CVPR 2021 Can Zhang, Meng Cao, Dongming Yang, Jie Chen, Yuexian Zou

In this paper, we argue that learning by comparing helps identify these hard snippets and we propose to utilize snippet Contrastive learning to Localize Actions, CoLA for short.

CoLA Contrastive Learning +3

FWB-Net:Front White Balance Network for Color Shift Correction in Single Image Dehazing via Atmospheric Light Estimation

no code implementations21 Jan 2021 Cong Wang, Yan Huang, Yuexian Zou, Yong Xu

However, for images taken in real-world, the illumination is not uniformly distributed over whole image which brings model mismatch and possibly results in color shift of the deep models using ASM.

Image Dehazing Single Image Dehazing

Adaptive Bi-directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension

no code implementations20 Dec 2020 Nuo Chen, Fenglin Liu, Chenyu You, Peilin Zhou, Yuexian Zou

To predict the answer, it is common practice to employ a predictor to draw information only from the final encoder layer which generates the \textit{coarse-grained} representations of the source sequences, i. e., passage and question.

Machine Reading Comprehension

Federated Learning for Spoken Language Understanding

no code implementations COLING 2020 Zhiqi Huang, Fenglin Liu, Yuexian Zou

To this end, we propose a federated learning framework, which could unify various types of datasets as well as tasks to learn and fuse various types of knowledge, i. e., text representations, from different datasets and tasks, without the sharing of downstream task data.

Federated Learning Intent Detection +4

Rethinking Skip Connection with Layer Normalization

no code implementations COLING 2020 Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang, Xu sun, Yuexian Zou

In this work, we investigate how the scale factors in the effectiveness of the skip connection and reveal that a trivial adjustment of the scale will lead to spurious gradient exploding or vanishing in line with the deepness of the models, which could by addressed by normalization, in particular, layer normalization, which induces consistent improvements over the plain skip connection.

Image Classification Machine Translation +1

Prophet Attention: Predicting Attention with Future Attention

no code implementations NeurIPS 2020 Fenglin Liu, Xuancheng Ren, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou, Xu sun

Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words.

Image Captioning

Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering

no code implementations21 Oct 2020 Chenyu You, Nuo Chen, Yuexian Zou

Spoken conversational question answering (SCQA) requires machines to model complex dialogue flow given the speech utterances and text corpora.

Audio Signal Processing Conversational Question Answering +2

Knowledge Distillation for Improved Accuracy in Spoken Question Answering

no code implementations21 Oct 2020 Chenyu You, Nuo Chen, Yuexian Zou

However, the recent work shows that ASR systems generate highly noisy transcripts, which critically limit the capability of machine comprehension on the SQA task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Towards Data Distillation for End-to-end Spoken Conversational Question Answering

no code implementations18 Oct 2020 Chenyu You, Nuo Chen, Fenglin Liu, Dongchao Yang, Yuexian Zou

In spoken question answering, QA systems are designed to answer questions from contiguous text spans within the related speech transcripts.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

PIN: A Novel Parallel Interactive Network for Spoken Language Understanding

no code implementations28 Sep 2020 Peilin Zhou, Zhiqi Huang, Fenglin Liu, Yuexian Zou

However, we noted that, so far, the efforts to obtain better performance by supporting bidirectional and explicit information exchange between ID and SF are not well studied. In addition, few studies attempt to capture the local context information to enhance the performance of SF.

Intent Detection Language Modelling +3

A Graph-based Interactive Reasoning for Human-Object Interaction Detection

no code implementations14 Jul 2020 Dongming Yang, Yuexian Zou

However, recent HOI detection methods mostly rely on additional annotations (e. g., human pose) and neglect powerful interactive reasoning beyond convolutions.

Human-Object Interaction Detection

All you need is a second look: Towards Tighter Arbitrary shape text detection

no code implementations26 Apr 2020 Meng Cao, Yuexian Zou

Specifically, \textit{NASK} consists of a Text Instance Segmentation network namely \textit{TIS} (\(1^{st}\) stage), a Text RoI Pooling module and a Fiducial pOint eXpression module termed as \textit{FOX} (\(2^{nd}\) stage).

Instance Segmentation Scene Text Detection +1

Multi-modal Multi-channel Target Speech Separation

no code implementations16 Mar 2020 Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lian-Wu Chen, Yuexian Zou, Dong Yu

Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers.

Speech Separation

GID-Net: Detecting Human-Object Interaction with Global and Instance Dependency

no code implementations11 Mar 2020 Dongming Yang, Yuexian Zou, Jian Zhang, Ge Li

GID block breaks through the local neighborhoods and captures long-range dependency of pixels both in global-level and instance-level from the scene to help detecting interactions between instances.

Human-Object Interaction Detection

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

no code implementations9 Mar 2020 Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.

Speech Separation

Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation

no code implementations2 Jan 2020 Rongzhi Gu, Yuexian Zou

To address these challenges, we propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture in reverberant environments, assisted with directional information of the speaker(s).

Speech Separation

Environmental Sound Classification with Parallel Temporal-spectral Attention

no code implementations14 Dec 2019 Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang

Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC).

Acoustic Scene Classification Classification +3

Non-Autoregressive Coarse-to-Fine Video Captioning

1 code implementation27 Nov 2019 Bang Yang, Yuexian Zou, Fenglin Liu, Can Zhang

However, mainstream video captioning methods suffer from slow inference speed due to the sequential manner of autoregressive decoding, and prefer generating generic descriptions due to the insufficient training of visual words (e. g., nouns and verbs) and inadequate decoding paradigm.

Video Captioning

C-RPNs: Promoting Object Detection in real world via a Cascade Structure of Region Proposal Networks

no code implementations19 Aug 2019 Dongming Yang, Yuexian Zou, Jian Zhang, Ge Li

Although two-stage detectors like Faster R-CNN achieved big successes in object detection due to the strategy of extracting region proposals by region proposal network, they show their poor adaption in real-world object detection as a result of without considering mining hard samples during extracting region proposals.

object-detection Object Detection +1

End-to-End Multi-Channel Speech Separation

no code implementations15 May 2019 Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.

Speech Separation

Cannot find the paper you are looking for? You can Submit a new open access paper.