Search Results for author: Siqi Zheng

Found 26 papers, 13 papers with code

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

1 code implementation • 29 Mar 2024 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization.

Self-Supervised Learning speaker-diarization +3

720

Paper
Code

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

1 code implementation • 19 Feb 2024 • Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao

Furthermore, we also validate the efficiency of the Language-Codec on downstream speech language models.

Audio Compression Audio Generation +1

152

Paper
Code

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

no code implementations • 13 Feb 2024 • Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

1 code implementation • 8 Nov 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach.

Decoder

Paper
Code

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

1 code implementation • 7 Oct 2023 • JiaMing Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

Audio captioning Automatic Speech Recognition +11

284

Paper
Code

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

no code implementations • 19 Sep 2023 • Luyao Cheng, Siqi Zheng, Qinglin Zhang, Hui Wang, Yafeng Chen, Qian Chen, Shiliang Zhang

Speaker diarization has gained considerable attention within speech processing research community.

speaker-diarization Speaker Diarization +1

Paper
Add Code

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

1 code implementation • 14 Sep 2023 • Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.

Automatic Speech Recognition speech-recognition +3

284

Paper
Code

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision

1 code implementation • 5 Aug 2023 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang

It assigns representation of augmented views of utterances to the same prototypes as the representation of the original view, thereby enabling effective knowledge transfer between the views.

Representation Learning Speaker Verification +1

720

Paper
Code

Improving BERT with Hybrid Pooling Network and Drop Mask

no code implementations • 14 Jul 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Chong Deng, Ma Yukun, Siqi Zheng

Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks.

Language Modelling Masked Language Modeling +2

Paper
Add Code

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

1 code implementation • 27 Jun 2023 • Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang, Qian Chen

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community.

Disentanglement Self-Supervised Learning

720

Paper
Code

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

no code implementations • 22 May 2023 • Luyao Cheng, Siqi Zheng, Zhang Qinglin, Hui Wang, Yafeng Chen, Qian Chen

In this paper, we propose methods to extract speaker-related information from semantic content in multi-party meetings, which, as we will show, can further benefit speaker diarization.

speaker-diarization Speaker Diarization +1

Paper
Add Code

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification

2 code implementations • 22 May 2023 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Jiajun Qi

This paper proposes a novel architecture called Enhanced Res2Net (ERes2Net), which incorporates both local and global feature fusion techniques to improve the performance.

Speaker Verification

720

Paper
Code

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings

1 code implementation • 18 May 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

Prior studies diagnose the anisotropy problem in sentence representations from pre-trained language models, e. g., BERT, without fine-tuning.

Language Modelling Semantic Textual Similarity +4

Paper
Code

DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect

no code implementations • 14 Dec 2022 • Jinglin Liu, Zhenhui Ye, Qian Chen, Siqi Zheng, Wen Wang, Qinglin Zhang, Zhou Zhao

Recently, binaural audio synthesis (BAS) has emerged as a promising research field for its applications in augmented and virtual realities.

Audio Synthesis

Paper
Add Code

Contextual Expressive Text-to-Speech

no code implementations • 26 Nov 2022 • Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou

To achieve this task, we construct a synthetic dataset and develop an effective framework.

Speech Synthesis

Paper
Add Code

Pushing the limits of self-supervised speaker verification using regularized distillation framework

1 code implementation • 8 Nov 2022 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen

A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the regularized DINO framework in speaker verification.

Data Augmentation Self-Supervised Learning +1

720

Paper
Code

Deep Representation Decomposition for Rate-Invariant Speaker Verification

no code implementations • 28 May 2022 • Fuchuan Tong, Siqi Zheng, Haodong Zhou, Xingjia Xie, Qingyang Hong, Lin Li

While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability.

Speaker Verification

Paper
Add Code

Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure

no code implementations • 26 Apr 2022 • Siqi Zheng, Hongbin Suo

In this paper we propose to view clustering-based diarization as a community detection problem.

Clustering Community Detection +2

Paper
Add Code

Graph Convolutional Network Based Semi-Supervised Learning on Multi-Speaker Meeting Data

no code implementations • 25 Apr 2022 • Fuchuan Tong, Siqi Zheng, Min Zhang, Yafeng Chen, Hongbin Suo, Qingyang Hong, Lin Li

In this work, we present a GCN-based approach for semi-supervised learning.

Clustering Speaker Recognition

Paper
Add Code

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

1 code implementation • 18 Mar 2022 • Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan

Through this formulation, we propose the speaker embedding-aware neural diarization (SEND) framework, where a speech encoder, a speaker encoder, two similarity scorers, and a post-processing network are jointly optimized to predict the encoded labels according to the similarities between speech features and speaker embeddings.

Ranked #1 on Speaker Diarization on AliMeeting

Action Detection Activity Detection +2

3,442

Paper
Code

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

2 code implementations • 28 Nov 2021 • Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

3,442

Paper
Code

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences

1 code implementation • ICLR 2022 • Chao-Hong Tan, Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Zhen-Hua Ling

We propose a novel Pooling Network (PoNet) for token mixing in long sequences with linear complexity.

Transfer Learning

Paper
Code

BeamTransformer: Microphone Array-based Overlapping Speech Detection

no code implementations • 9 Sep 2021 • Siqi Zheng, Shiliang Zhang, Weilong Huang, Qian Chen, Hongbin Suo, Ming Lei, Jinwei Feng, Zhijie Yan

We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling.

Paper
Add Code

Measuring daily-life fear perception change: a computational study in the context of COVID-19

no code implementations • 27 Jul 2021 • Yuchen Chai, Juan Palacios, Jianghao Wang, Yichun Fan, Siqi Zheng

COVID-19, as a global health crisis, has triggered the fear emotion with unprecedented intensity.

Decision Making Topic Models

Paper
Add Code

A Real-time Speaker Diarization System Based on Spatial Spectrum

no code implementations • 20 Jul 2021 • Siqi Zheng, Weilong Huang, Xianliang Wang, Hongbin Suo, Jinwei Feng, Zhijie Yan

In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting.

speaker-diarization Speaker Diarization +1

Paper
Add Code

Estimating air quality co-benefits of energy transition using machine learning

no code implementations • 29 May 2021 • Da Zhang, Qingyi Wang, Shaojie Song, Simiao Chen, MingWei Li, Lu Shen, Siqi Zheng, Bofeng Cai, Shenhao Wang

Applications of the framework with Chinese data reveal highly heterogeneous health benefits of reducing fossil fuel use in different sectors and regions in China with a mean of \$34/tCO2 and a standard deviation of \$84/tCO2.

BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.