Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result.
Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER.
In this paper, we outline an innovative approach for augmenting the proficiency of LLMs in the realm of automated diagnosis generation, achieved through the incorporation of a medical knowledge graph (KG) and a novel graph model: Dr. Knows, inspired by the clinical diagnostic reasoning process.
Specifically, we design a noise classification (NC) model to produce acoustic embedding as a noise conditioner for guiding the reverse denoising process.
In this paper, we aim to learn the shared representations across modalities to bridge their gap.
In this work, we investigate the noise-invariant visual modality to strengthen robustness of AVSR, which can adapt to any testing noises while without dependence on noisy training data, a. k. a., unsupervised noise adaptation.
However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in sub-optimal multimodal representations for downstream speech recognition task.
In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude.
The final trained model was also evaluated on an independent test set by the CMRxMotion organisers, which achieved the classification accuracy of 72. 5% and Cohen's Kappa of 0. 6309 (ranked top 1 in this grand challenge).
Understanding speaker's feelings and producing appropriate responses with emotion connection is a key communicative skill for empathetic dialogue systems.
The proposed method was evaluated on a public brain MRI data set for age estimation.
The Variational Autoencoder (VAE) is a popular and powerful model applied to text modelling to generate diverse sentences.
In this paper, we present a generic deep convolutional neural network (DCNN) for multi-class image segmentation.
To address this problem, we propose a generic semi-supervised learning framework for image segmentation based on a deep convolutional neural network (DCNN).
We demonstrate the utility of our method for attribute manipulation in autoencoders trained across varied domains, using both human evaluation and automated methods.
Ranked #7 on Image Generation on CelebA 256x256 (FID metric)
Recognising dialogue acts (DA) is important for many natural language processing tasks such as dialogue generation and intention recognition.
Ranked #4 on Dialogue Act Classification on Switchboard corpus
This paper describes the system that we submitted for SemEval-2018 task 10: capturing discriminative attributes.