Search Results for author: Ruijie Tao

Found 14 papers, 9 papers with code

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

no code implementations1 Apr 2024 Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li

Our experimental results on three created datasets demonstrated that VCA-NN effectively mitigates these dataset problems, which provides a new direction for handling the speaker recognition problems from the data aspect.

Speaker Recognition Voice Conversion

Prompt-driven Target Speech Diarization

no code implementations23 Oct 2023 Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li

We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal.

Action Detection Activity Detection

Target Active Speaker Detection with Audio-visual Cues

1 code implementation22 May 2023 Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li

To benefit from both facial cue and reference speech, we propose the Target Speaker TalkNet (TS-TalkNet), which leverages a pre-enrolled speaker embedding to complement the audio-visual synchronization cue in detecting whether the target speaker is speaking.

Audio-Visual Synchronization

Speaker recognition with two-step multi-modal deep cleansing

1 code implementation28 Oct 2022 Ruijie Tao, Kong Aik Lee, Zhan Shi, Haizhou Li

However, noisy samples (i. e., with wrong labels) in the training set induce confusion and cause the network to learn the incorrect representation.

Representation Learning Speaker Recognition +1

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

no code implementations27 Oct 2022 Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels.

Contrastive Learning Self-Supervised Learning +1

HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE

3 code implementations12 Nov 2021 Rohan Kumar Das, Ruijie Tao, Haizhou Li

This work provides a brief description of Human Language Technology (HLT) Laboratory, National University of Singapore (NUS) system submission for 2020 NIST conversational telephone speech (CTS) speaker recognition evaluation (SRE).

Domain Adaptation Speaker Recognition

Ego4D: Around the World in 3,000 Hours of Egocentric Video

6 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Selective Listening by Synchronizing Speech with Lips

1 code implementation14 Jun 2021 Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li

A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker speech mixture when given a cue that represents the target speaker, such as a pre-enrolled speech utterance, or an accompanying video track.

Lip Reading Target Speaker Extraction

Muse: Multi-modal target speaker extraction with visual cues

1 code implementation15 Oct 2020 Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li

Speaker extraction algorithm relies on the speech sample from the target speaker as the reference point to focus its attention.

Target Speaker Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.