Speaker Identification
61 papers with code • 4 benchmarks • 4 datasets
Latest papers
SIG: Speaker Identification in Literature via Prompt-Based Generation
Identifying speakers of quotations in narratives is an important task in literary analysis, with challenging scenarios including the out-of-domain inference for unseen speakers, and non-explicit cases where there are no speaker mentions in surrounding context.
InstructERC: Reforming Emotion Recognition in Conversation with a Retrieval Multi-task LLMs Framework
The field of emotion recognition of conversation (ERC) has been focusing on separating sentence feature encoding and context modeling, lacking exploration in generative paradigms based on unified designs.
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Wav2vec2 has achieved success in applying Transformer architecture and self-supervised learning to speech recognition.
Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment
Dysarthria is a disability that causes a disturbance in the human speech system and reduces the quality and intelligibility of a person's speech.
Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals
We find that a greater adversarial weight for the initial layers leads to performance improvement.
MPCHAT: Towards Multimodal Persona-Grounded Conversation
In order to build self-consistent personalized dialogue agents, previous research has mostly focused on textual persona that delivers personal facts or personalities.
GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding
Addressing the issues of who saying what to whom in multi-party conversations (MPCs) has recently attracted a lot of research attention.
Unsupervised Speech Representation Pooling Using Vector Quantization
However, the pooling problem remains; the length of speech representations is inherently variable.
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
Transformers, which were originally developed for natural language processing, have recently generated significant interest in the computer vision and audio communities due to their flexibility in learning long-range relationships.
MelHuBERT: A simplified HuBERT on Mel spectrograms
Self-supervised models have had great success in learning speech representations that can generalize to various downstream tasks.