t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

15 Sep 2023

Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

14 Aug 2023

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech.

Language Modelling Multi-Task Learning +2

MACO: A Modality Adversarial and Contrastive Framework for Modality-missing Multi-modal Knowledge Graph Completion

13 Aug 2023

Nevertheless, existing methods emphasize the design of elegant KGC models to facilitate modality interaction, neglecting the real-life problem of missing modalities in KGs.

Multi-modal Knowledge Graph

ELFNet: Evidential Local-global Fusion for Stereo Matching

1 Aug 2023

Although existing stereo matching models have achieved continuous improvement, they often face issues related to trustworthiness due to the absence of uncertainty estimation.

Domain Generalization Stereo Matching

Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment

30 Jul 2023

As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to identify identical entities across disparate knowledge graphs (KGs) by exploiting associated visual information.

 Ranked #1 on Multi-modal Entity Alignment on UMVM-oea-d-w-v2 (using extra training data)

Benchmarking Knowledge Graph Embeddings +2

On decoder-only architecture for speech-to-text and large language model integration

8 Jul 2023

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.

Language Modelling Large Language Model +1

AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology

16 Jun 2023

In this pioneering study, inspired by AutoGPT, the state-of-the-art open-source application based on the GPT-4 large language model, we develop a novel tool called AD-AutoGPT which can conduct data collection, processing, and analysis about complex health narratives of Alzheimer's Disease in an autonomous manner via users' textual prompts.

Language Modelling Large Language Model

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

13 Jun 2023

Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields.

Catalytic activity prediction Chemical-Disease Interaction Extraction +14

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

30 May 2023

State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

25 May 2023

Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities.

Language Modelling Multi-Task Learning +3

Newton-Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems

24 May 2023

Reasoning system dynamics is one of the most important analytical approaches for many scientific studies.

Revisit and Outstrip Entity Alignment: A Perspective of Generative Models

24 May 2023

We then reveal that their incomplete objective limits the capacity on both entity alignment and entity synthesis (i. e., generating new entities).

Entity Alignment

HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks

19 Apr 2023

While the use of 3D-aware GANs bypasses the requirement of 3D data, we further alleviate the necessity of style images with the CLIP model being the stylization guidance.

ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation

4 Apr 2023

Quantum many-body physics simulation has important impacts on understanding fundamental science and has applications to quantum materials design and quantum technology.

Inductive Bias Tensor Networks

Target Sound Extraction with Variable Cross-modality Clues

15 Mar 2023

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

Target Sound Extraction

View Consistency Aware Holistic Triangulation for 3D Human Pose Estimation

22 Feb 2023

The rapid development of multi-view 3D human pose estimation (HPE) is attributed to the maturation of monocular 2D HPE and the geometry of 3D reconstruction.

3D Human Pose Estimation 3D Reconstruction +1

Speaker Change Detection for Transformer Transducer ASR

16 Feb 2023

Speaker change detection (SCD) is an important feature that improves the readability of the recognized words from an automatic speech recognition (ASR) system by breaking the word sequence into paragraphs at speaker change points.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Domain-Agnostic Molecular Generation with Self-feedback

26 Jan 2023

To this end, we introduce MolGen, a pre-trained molecular language model tailored specifically for molecule generation.

Language Modelling Molecular Docking +1

Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo

CVPR 2023

To detect more anchor pixels to ensure better adaptive patch deformation, we propose to evaluate the matching ambiguity of a certain pixel by checking the convergence of the estimated depth as optimization proceeds.

MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

29 Dec 2022

Multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) whose entities are associated with relevant images.

 Ranked #1 on Entity Alignment on FBYG15k (using extra training data)

Knowledge Graphs Multi-modal Entity Alignment

BEATs: Audio Pre-Training with Acoustic Tokenizers

18 Dec 2022

In the first iteration, we use random projection as the acoustic tokenizer to train an audio SSL model in a mask and label prediction manner.

 Ranked #1 on Audio Classification on ESC-50 (using extra training data)

Audio Classification Self-Supervised Learning

Simulating 2+1D Lattice Quantum Electrodynamics at Finite Density with Neural Flow Wavefunctions

14 Dec 2022

We present a neural flow wavefunction, Gauge-Fermion FlowNet, and use it to simulate 2+1D lattice compact quantum electrodynamics with finite density dynamical fermions.


Exploring WavLM on Speech Enhancement

18 Nov 2022

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success.

Self-Supervised Learning Speech Enhancement +2

An Adapter based Multi-label Pre-training for Speech Separation and Enhancement

11 Nov 2022

In recent years, self-supervised learning (SSL) has achieved tremendous success in various speech tasks due to its power to extract representations from massive unlabeled data.

Denoising Pseudo Label +4

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

11 Nov 2022

Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Speech separation with large-scale self-supervised learning

9 Nov 2022

Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15. 9% and 11. 2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set.

Self-Supervised Learning Speech Separation

Simulating realistic speech overlaps improves multi-talker ASR

27 Oct 2022

Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Real-time Speech Interruption Analysis: From Cloud to Client Deployment

24 Oct 2022

Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic.

Tele-Knowledge Pre-training for Fault Analysis

20 Oct 2022

In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents.

Language Modelling

On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation

19 Sep 2022

Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges.

Monocular Depth Estimation

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

12 Sep 2022

To combine the best of both technologies, we newly design a t-SOT-based ASR model that generates a serialized multi-talker transcription based on two separated speech signals from VarArray.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Target-oriented Sentiment Classification with Sequential Cross-modal Semantic Graph

19 Aug 2022

Multi-modal aspect-based sentiment classification (MABSC) is task of classifying the sentiment of a target entity mentioned in a sentence and an image.

Image Captioning Sentiment Analysis +1

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

4 Jul 2022

Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.

Contrastive Learning Image Classification +3

Disentangled Ontology Embedding for Zero-shot Learning

8 Jun 2022

In this paper, we focus on ontologies for augmenting ZSL, and propose to learn disentangled ontology embeddings guided by ontology properties to capture and utilize more fine-grained class relationships in different aspects.

Image Classification Ontology Embedding +2

Ultra Fast Speech Separation Model with Teacher Student Learning

27 Apr 2022

In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning).

Speech Separation

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

27 Apr 2022

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +3

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

30 Mar 2022

The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Knowledge-informed Molecular Learning: A Survey on Paradigm Transfer

17 Feb 2022

To enhance the generation and decipherability of purely data-driven models, scholars have integrated biochemical domain knowledge into these molecular study models.

Molecular Property Prediction Property Prediction

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

2 Feb 2022

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A New Image Codec Paradigm for Human and Machine Uses

19 Dec 2021

Meanwhile, an image predictor is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.

Decision Making Image Compression +7

Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey

18 Dec 2021

Machine learning especially deep neural networks have achieved great success but many of them often rely on a number of labeled samples for supervision.

Data Augmentation Few-Shot Learning +10

Molecular Contrastive Learning with Chemical Element Knowledge Graph

1 Dec 2021

To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning.

Contrastive Learning Molecular Property Prediction +3

Continuous Speech Separation with Recurrent Selective Attention Network

28 Oct 2021

In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting.

speech-recognition Speech Recognition +1

One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

20 Oct 2021

Experimental results show that the proposed geometry agnostic model outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

18 Oct 2021

Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.

Speech Enhancement speech-recognition +1

All-neural beamformer for continuous speech separation

13 Oct 2021

Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

12 Oct 2021

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

7 Oct 2021

Similar to the target-speaker voice activity detection (TS-VAD)-based diarization method, the E2E SA-ASR model is applied to estimate speech activity of each speaker while it has the advantages of (i) handling unlimited number of speakers, (ii) leveraging linguistic information for speaker diarization, and (iii) simultaneously generating speaker-attributed transcriptions.

Action Detection Activity Detection +6

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

17 Sep 2021

Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.

Speech Separation

Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

7 Aug 2021

Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.

Action Detection Activity Detection +3

Spacetime Neural Network for High Dimensional Quantum Dynamics

4 Aug 2021

We develop a spacetime neural network method with second order optimization for solving quantum dynamics from the high dimensional Schr\"{o}dinger equation.

Vocal Bursts Intensity Prediction

Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs

8 Jul 2021

Our method achieves the state-of-the-art performance on ImageNet, 80. 7% top-1 accuracy with 194M FLOPs.

Image Classification

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

6 Jul 2021

Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8. 9--29. 9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

5 Jul 2021

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Modeling and Reasoning in Event Calculus using Goal-Directed Constraint Answer Set Programming

28 Jun 2021

Automated commonsense reasoning is essential for building human-like AI systems featuring, for example, explainable AI.

End-to-End Speaker-Attributed ASR with Transformer

5 Apr 2021

This paper presents our recent effort on end-to-end speaker-attributed automatic speech recognition, which jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

31 Mar 2021

Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Continuous Speech Separation with Ad Hoc Microphone Arrays

3 Mar 2021

Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.

speech-recognition Speech Recognition +1

Knowledge-aware Zero-Shot Learning: Survey and Perspective

26 Feb 2021

Zero-shot learning (ZSL) which aims at predicting classes that have never appeared during the training using external knowledge (a. k. a.

BIG-bench Machine Learning Zero-Shot Learning

OntoZSL: Ontology-enhanced Zero-shot Learning

15 Feb 2021

The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic relationship between classes and enables the transfer of the learned models (e. g., features) from training classes (i. e., seen classes) to unseen classes.

Image Classification Knowledge Graph Completion +2

Gauge Invariant and Anyonic Symmetric Autoregressive Neural Networks for Quantum Lattice Models

18 Jan 2021

Symmetries such as gauge invariance and anyonic symmetry play a crucial role in quantum many-body physics.

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

3 Nov 2020

Recently, an end-to-end speaker-attributed automatic speech recognition (E2E SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

23 Oct 2020

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently.

Speech Separation

Speaker Separation Using Speaker Inventories and Estimated Speech

20 Oct 2020

We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.

Speaker Separation Speech Extraction +2

An End-to-end Architecture of Online Multi-channel Speech Separation

7 Sep 2020

Previously, we introduced a sys-tem, calledunmixing, fixed-beamformerandextraction(UFE), that was shown to be effective in addressing the speech over-lap problem in conversation transcription.

speech-recognition Speech Recognition +1

Brain Stroke Lesion Segmentation Using Consistent Perception Generative Adversarial Network

30 Aug 2020

The assistant network and the discriminator are employed to jointly decide whether the segmentation results are real or fake.

Lesion Segmentation

Continuous Speech Separation with Conformer

13 Aug 2020

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription.

 Ranked #1 on Speech Separation on LibriCSS (using extra training data)

Speech Separation

Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

11 Aug 2020

However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Deep Multi-Task Learning for Cooperative NOMA: System Design and Principles

27 Jul 2020

We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.

Multi-Task Learning

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers

19 Jun 2020

We propose an end-to-end speaker-attributed automatic speech recognition model that unifies speaker counting, speech recognition, and speaker identification on monaural overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation

CVPR 2020

Portrait animation, which aims to animate a still portrait to life using poses extracted from target frames, is an important technique for many real-world entertainment applications.

Semantic Segmentation

Neural Speech Separation Using Spatially Distributed Microphones

28 Apr 2020

The inter-channel processing layers apply a self-attention mechanism along the channel dimension to exploit the information obtained with a varying number of microphones.

speech-recognition Speech Recognition +1

Generative Adversarial Zero-shot Learning via Knowledge Graphs

7 Apr 2020

However, the side information of classes used now is limited to text descriptions and attribute annotations, which are in short of semantics of the classes.

Image Classification Knowledge Graphs +1

Continuous speech separation: dataset and analysis

30 Jan 2020

In this paper, we define continuous speech separation (CSS) as a task of generating a set of non-overlapped speech signals from a \textit{continuous} audio stream that contains multiple utterances that are \emph{partially} overlapped by a varying degree.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

18 Nov 2019

This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.

Speaker Separation Speech Enhancement +3

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

30 Oct 2019

An important problem in ad-hoc microphone speech separation is how to guarantee the robustness of a system with respect to the locations and numbers of microphones.

Speech Separation

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation

14 Oct 2019

Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional time-frequency-based methods.

Speech Separation

A Learning-Based Two-Stage Spectrum Sharing Strategy with Multiple Primary Transmit Power Levels

21 Jul 2019

Then, based on a novel normalized power level alignment metric, we propose two prediction-transmission structures, namely periodic and non-periodic, for spectrum access (the second part in Stage II), which enable the secondary transmitter (ST) to closely follow the PT power level variation.

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch

12 Jul 2019

While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE.

speech-recognition Speech Recognition

ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection

19 Jun 2019

In this paper, we propose Virtual Pooling (ViP), a model-level approach to improve speed and energy consumption of CNN-based image classification and object detection tasks, with a provable error bound.

General Classification Image Classification +2

Low-Latency Speaker-Independent Continuous Speech Separation

13 Apr 2019

Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment.

speech-recognition Speech Recognition +1

Understanding the Impact of Label Granularity on CNN-based Image Classification

21 Jan 2019

In this paper, we conduct extensive experiments using various datasets to demonstrate and analyze how and why training based on fine-grain labeling, such as "Persian cat" can improve CNN accuracy on classifying coarse-grain classes, in this case "cat."

General Classification Image Classification

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

8 Oct 2018

The goal of this work is to develop a meeting transcription system that can recognize speech even when utterances of different speakers are overlapped.

speech-recognition Speech Recognition +1

Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing

17 Sep 2018

The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical.

Data Compression Feature Compression

Developing Far-Field Speaker System Via Teacher-Student Learning

14 Apr 2018

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.

Keyword Spotting Model Compression

Speaker-Invariant Training via Adversarial Learning

2 Apr 2018

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system.

General Classification Multi-Task Learning

Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition

21 Nov 2017

Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Image Quality Assessment Guided Deep Neural Networks Training

13 Aug 2017

For many computer vision problems, the deep neural networks are trained and validated based on the assumption that the input images are pristine (i. e., artifact-free).

Data Augmentation Image Classification +1

Improving Adherence to Heart Failure Management Guidelines via Abductive Reasoning

16 Jul 2017

A standard approach to managing chronic diseases by medical community is to have a committee of experts develop guidelines that all physicians should follow.


Speaker-independent Speech Separation with Deep Attractor Network

12 Jul 2017

A reference point attractor is created in the embedding space to represent each speaker which is defined as the centroid of the speaker in the embedding space.

Speech Separation

End-to-End Attention based Text-Dependent Speaker Verification

3 Jan 2017

A new type of End-to-End system for text-dependent speaker verification is presented in this paper.

Text-Dependent Speaker Verification

Deep attractor network for single-microphone speaker separation

27 Nov 2016

We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source.

Speaker Separation Speech Separation

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

18 Nov 2016

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks.

Clustering Deep Clustering +3

A Physician Advisory System for Chronic Heart Failure Management Based on Knowledge Patterns

25 Oct 2016

In this paper we describe a physician-advisory system for CHF management that codes the entire set of clinical practice guidelines for CHF using answer set programming.


Single-Channel Multi-Speaker Separation using Deep Clustering

7 Jul 2016

In this paper we extend the baseline system with an end-to-end signal approximation objective that greatly improves performance on a challenging speech separation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Deep clustering: Discriminative embeddings for segmentation and separation

18 Aug 2015

The framework can be used without class labels, and therefore has the potential to be trained on a diverse set of sound types, and to generalize to novel sources.

Clustering Deep Clustering +3

