Alibaba Speech Translation Systems for IWSLT 2018

no code implementations IWSLT (EMNLP) 2018 Nguyen Bach, Hongjie Chen, Kai Fan, Cheung-Chi Leung, Bo Li, Chongjia Ni, Rong Tong, Pei Zhang, Boxing Chen, Bin Ma, Fei Huang

This work describes the En→De Alibaba speech translation system developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2018.

Sentence Translation

Robust Identity Perceptual Watermark Against Deepfake Face Swapping

no code implementations2 Nov 2023 Tianyi Wang, Mengxiao Huang, Harry Cheng, Bin Ma, Yinglong Wang

Notwithstanding offering convenience and entertainment to society, Deepfake face swapping has caused critical privacy issues with the rapid development of deep generative models.

Face Swapping

SPGM: Prioritizing Local Features for enhanced speech separation performance

no code implementations22 Sep 2023 Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

Dual-path is a popular architecture for speech separation models (e. g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships.

Speech Separation

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

1 code implementation20 May 2023 Jia Qi Yip, Tuan Truong, Dianwen Ng, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma

In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling.

Speaker Verification

Immune Defense: A Novel Adversarial Defense Mechanism for Preventing the Generation of Adversarial Examples

no code implementations8 Mar 2023 Jinwei Wang, Hao Wu, Haihua Wang, Jiawei Zhang, Xiangyang Luo, Bin Ma

Therefore, we propose a novel adversarial defense mechanism, which is referred to as immune defense and is the example-based pre-defense.

Adversarial Defense

MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions

1 code implementation23 Feb 2023 Shengkui Zhao, Bin Ma

To effectively solve the indirect elemental interactions across chunks in the dual-path architecture, MossFormer employs a joint local and global self-attention architecture that simultaneously performs a full-computation self-attention on local chunks and a linearised low-cost self-attention over the full sequence.

Speech Separation

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion

no code implementations25 Oct 2022 Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li

To achieve this, we propose a novel EVC framework, Mixed-EVC, which only leverages discrete emotion training labels.

Attribute Voice Conversion

A Multi-scale Video Denoising Algorithm for Raw Image

no code implementations5 Sep 2022 Bin Ma, Yueli Hu, Xianxian Lv, Kai Li

Video denoising for raw image has always been the difficulty of camera image processing.

Image Denoising Motion Estimation +1

Amino Acid Classification in 2D NMR Spectra via Acoustic Signal Embeddings

no code implementations1 Aug 2022 Jia Qi Yip, Dianwen Ng, Bin Ma, Konstantin Pervushin, Eng Siong Chng

Nuclear Magnetic Resonance (NMR) is used in structural biology to experimentally determine the structure of proteins, which is used in many areas of biology and is an important part of drug development.

Speaker Verification

Learning Disentangled Representations for Counterfactual Regression via Mutual Information Minimization

no code implementations2 Jun 2022 Mingyuan Cheng, Xinru Liao, Quan Liu, Bin Ma, Jian Xu, Bo Zheng

Learning individual-level treatment effect is a fundamental problem in causal inference and has received increasing attention in many areas, especially in the user growth area which concerns many internet companies.

Causal Inference counterfactual +3

A Unified Speaker Adaptation Approach for ASR

1 code implementation EMNLP 2021 Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq Joty, Eng Siong Chng, Bin Ma

For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

no code implementations3 Feb 2021 Shengkui Zhao, Hao Wang, Trung Hieu Nguyen, Bin Ma

Cross-lingual voice conversion (VC) is an important and challenging problem due to significant mismatches of the phonetic set and the speech prosody of different languages.

Voice Conversion

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

1 code implementation3 Feb 2021 Shengkui Zhao, Trung Hieu Nguyen, Bin Ma

In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features.

Speech Enhancement

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

1 code implementation16 Oct 2020 Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma

With these data, three neural TTS models -- Tacotron2, Transformer and FastSpeech are applied for building bilingual and code-switched TTS.

Speech Synthesis Voice Conversion

Cloud Cover and Aurora Contamination at Dome A in 2017 from KLCAM

no code implementations7 Oct 2020 Xu Yang, Zhaohui Shang, Keliang Hu, Yi Hu, Bin Ma, Yongjiang Wang, Zihuang Cao, Michael C. B. Ashley, Wei Wang

Dome A in Antarctica has many characteristics that make it an excellent site for astronomical observations, from the optical to the terahertz.

Instrumentation and Methods for Astrophysics

Flow Based Self-supervised Pixel Embedding for Image Segmentation

no code implementations2 Jan 2019 Bin Ma, Shubao Liu, Yingxuan Zhi, Qi Song

Building on these, we demonstrate that image features can be learned in self-supervision by first training an optical flow estimator with synthetic flow data, and then learning image features from the estimated flows in real motion data.

Image Segmentation Optical Flow Estimation +2

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

no code implementations10 Jun 2018 Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li

We also find that it is important to have sufficient speech segment pairs to train the deep CNN for effective acoustic word embeddings.

Dynamic Time Warping Word Embeddings

Fantastic 4 system for NIST 2015 Language Recognition Evaluation

no code implementations5 Feb 2016 Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier

This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Universit\'e du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE).


The similarity metric

no code implementations20 Nov 2001 Ming Li, Xin Chen, Xin Li, Bin Ma, Paul Vitanyi

A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied.

