no code implementations • 22 Jul 2024 • Yuetong Zhao, Hongyu Cao, Xianyu Zhao, Zhijian Ou
We evaluated the RAFT method across multiple datasets and analysed its performance in various reasoning tasks, including long-form QA and short-form QA tasks, tasks in both Chinese and English, and supportive and comparison reasoning tasks.
no code implementations • 18 Jul 2024 • Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu, Zhijian Ou
Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 13 Jul 2024 • Xiangzhu Kong, Tianqi Ning, Hao Huang, Zhijian Ou
Recently multi-channel end-to-end (ME2E) ASR systems have emerged.
1 code implementation • 4 Jun 2024 • Saierdaer Yusuyin, Te Ma, Hao Huang, Wenbo Zhao, Zhijian Ou
We construct a common experimental setup based on the CommonVoice dataset, called CV-Lang10, with 10 seen languages and 2 unseen languages.
1 code implementation • 21 May 2024 • Yucheng Cai, Si Chen, Yuxuan Wu, Yi Huang, Junlan Feng, Zhijian Ou
Recently, increasing research interests have focused on retrieval augmented generation (RAG) to mitigate hallucination for large language models (LLMs).
no code implementations • 16 Mar 2024 • Zhijian Ou
Therefore, the purpose of this monograph is to present a systematic introduction to energy-based models, including both algorithmic progress and applications in speech and language processing.
1 code implementation • 17 Nov 2023 • Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng
Inspired by the recently emerging prompt tuning method that performs well on dialog systems, we propose to use the prompt pool method, where we maintain a pool of key-value paired prompts and select prompts from the pool according to the distance between the dialog history and the prompt keys.
no code implementations • 20 Sep 2023 • Yucheng Cai, Wentao Ma, Yuchuan Wu, Shuzheng Si, Yuan Shao, Zhijian Ou, Yongbin Li
Using the high-quality prompts generated, we scale the corpus of the pre-trained conversation model to 122 datasets from 15 dialog-related tasks, resulting in Universal Pre-trained Conversation Model (UniPCM), a powerful foundation model for various conversational tasks and different dialog systems.
1 code implementation • 22 May 2023 • Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng
Most existing task-oriented dialog (TOD) systems track dialog states in terms of slots and values and use them to query a database to get relevant knowledge to generate responses.
1 code implementation • 22 May 2023 • Hong Liu, Zhaobiao Lv, Zhijian Ou, Wenbo Zhao, Qing Xiao
Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs).
1 code implementation • 21 Apr 2023 • Xinwei Zhang, Zhiqiang Tan, Zhijian Ou
Maximum likelihood (ML) learning for energy-based models (EBMs) is challenging, partly due to non-convergence of Markov chain Monte Carlo. Several variations of ML learning have been proposed, but existing methods all fail to achieve both post-training image generation and proper density estimation.
1 code implementation • 17 Oct 2022 • Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng
Second, an important ingredient in a US is that the user goal can be effectively incorporated and tracked; but how to flexibly integrate goal state tracking and develop an end-to-end trainable US for multi-domains has remained to be a challenge.
no code implementations • 13 Oct 2022 • Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng
Recently, there has been progress in supervised funetuning pretrained GPT-2 to build end-to-end task-oriented dialog (TOD) systems.
1 code implementation • 27 Sep 2022 • Hong Liu, Hao Peng, Zhijian Ou, Juanzi Li, Yi Huang, Junlan Feng
Recently, there have merged a class of task-oriented dialogue (TOD) datasets collected through Wizard-of-Oz simulated games.
1 code implementation • SIGDIAL (ACL) 2022 • Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng
In this paper, we propose to apply JSA to semi-supervised learning of the latent state TOD models, which is referred to as JSA-TOD.
1 code implementation • 6 Jul 2022 • Zhijian Ou, Junlan Feng, Juanzi Li, Yakun Li, Hong Liu, Hao Peng, Yi Huang, Jiangjiang Zhao
A challenge on Semi-Supervised and Reinforced Task-Oriented Dialog Systems, Co-located with EMNLP2022 SereTOD Workshop.
2 code implementations • 13 Apr 2022 • Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng
Recently, Transformer based pretrained language models (PLMs), such as GPT2 and T5, have been leveraged to build generative task-oriented dialog (TOD) systems.
no code implementations • 31 Mar 2022 • Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan
Based on the DR method, we propose a low-order density ratio method (LODR) by replacing the estimation with a low-order weak language model.
1 code implementation • 31 Mar 2022 • Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan
The simulation module is jointly trained with the ASR model using a self-supervised loss; the ASR model is optimized with the usual ASR loss, e. g., CTC-CRF as used in our experiments.
no code implementations • 31 Mar 2022 • Keyu An, Ji Xiao, Zhijian Ou
In this paper, we systematically compare the performance of three schemes to exploit external single-channel data for multi-channel end-to-end ASR, namely back-end pre-training, data scheduling, and data simulation, under different settings such as the sizes of the single-channel data and the choices of the front-end.
1 code implementation • 2 Nov 2021 • Wenyu Zhu, Zhiyao Feng, Zihan Zhang, Jianjun Chen, Zhijian Ou, Min Yang, Chao Zhang
Recovering binary programs' call graphs is crucial for inter-procedural analysis tasks and applications based on them. transfer One of the core challenges is recognizing targets of indirect calls (i. e., indirect callees).
2 code implementations • 9 Sep 2021 • Hong Liu, Yucheng Cai, Zhenru Lin, Zhijian Ou, Yi Huang, Junlan Feng
In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches.
1 code implementation • 11 Jul 2021 • Chengrui Zhu, Keyu An, Huahuan Zheng, Zhijian Ou
The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages.
1 code implementation • 7 Jul 2021 • Huahuan Zheng, Wenjie Peng, Zhijian Ou, Jinsong Zhang
Automatic speech recognition systems have been largely improved in the past few decades and current systems are mainly hybrid-based and end-to-end-based.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 6 Jul 2021 • Keyu An, Zhijian Ou
Recently, the end-to-end training approach for neural beamformer-supported multi-channel ASR has shown its effectiveness in multi-channel speech recognition.
no code implementations • 30 Apr 2021 • Keyu An, Yi Zhang, Zhijian Ou
Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems.
no code implementations • 13 Nov 2020 • Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao
Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.
Sound Audio and Speech Processing
1 code implementation • 11 Nov 2020 • Huahuan Zheng, Keyu An, Zhijian Ou
Using ST gradients to support sub-graph sampling is a core element to achieve efficient NAS beyond DARTS and SNAS.
Ranked #1 on Speech Recognition on WSJ dev93
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 25 Oct 2020 • Yunfu Song, Huahuan Zheng, Zhijian Ou
In contrast, generative SSL methods involve unsupervised learning based on generative models by either joint-training or pre-training, and are more appealing from the perspective of being domain-agnostic, since they do not inherently require data augmentations.
1 code implementation • EMNLP 2020 • Yichi Zhang, Zhijian Ou, Huixin Wang, Junlan Feng
In this paper we aim at alleviating the reliance on belief state labels in building end-to-end dialog systems, by leveraging unlabeled dialog data towards semi-supervised learning.
Ranked #2 on End-To-End Dialogue Modelling on MULTIWOZ 2.1
1 code implementation • 28 May 2020 • Zhijian Ou, Yunfu Song
Although with progress in introducing auxiliary amortized inference models, learning discrete latent variable models is still challenging.
1 code implementation • 27 May 2020 • Keyu An, Hongyu Xiang, Zhijian Ou
In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit).
Ranked #1 on Speech Recognition on Hub5'00 FISHER-SWBD
1 code implementation • ACL 2020 • Silin Gao, Yichi Zhang, Zhijian Ou, Zhou Yu
Neural generative models have achieved promising performance on dialog generation tasks if given a huge data set.
no code implementations • 14 Feb 2020 • Silin Gao, Zhijian Ou, Wei Yang, Huifang Xu
There has been a long recognition that discrete features (n-gram features) and neural network based features have complementary strengths for language models (LMs).
6 code implementations • 24 Nov 2019 • Yichi Zhang, Zhijian Ou, Zhou Yu
Conversations have an intrinsic one-to-many property, which means that multiple responses can be appropriate for the same dialog context.
Ranked #6 on End-To-End Dialogue Modelling on MULTIWOZ 2.0
2 code implementations • 20 Nov 2019 • Keyu An, Hongyu Xiang, Zhijian Ou
In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 16 Apr 2019 • Hongyu Xiang, Zhijian Ou
CTC-CRF is conceptually simple, which basically implements a CRF layer on top of features generated by the bottom neural network with the special state topology.
Ranked #2 on Speech Recognition on WSJ eval93
no code implementations • 4 Nov 2018 • Kai Hu, Zhijian Ou, Min Hu, Junlan Feng
Conditional random fields (CRFs) have been shown to be one of the most successful approaches to sequence labeling.
no code implementations • 4 Nov 2018 • Yinpei Dai, Yichi Zhang, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng
An ontology is defined by the collection of slots and the values that each slot can take.
no code implementations • 27 Sep 2018 • Yunfu Song, Zhijian Ou
Neural random fields (NRFs), which are defined by using neural networks to implement potential functions in undirected models, provide an interesting family of model spaces for machine learning.
no code implementations • 5 Aug 2018 • Zhijian Ou
This document aims to provide a review on learning with deep generative models (DGMs), which is an highly-active area in machine learning and more generally, artificial intelligence.
no code implementations • 13 Jul 2018 • Zhangyu Xiao, Zhijian Ou, Wei Chu, Hui Lin
In this paper, we present an end-to-end automatic speech recognition system, which successfully employs subword units in a hybrid CTC-Attention based system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 3 Jul 2018 • Bin Wang, Zhijian Ou
First, a dynamic noise distribution is introduced and trained simultaneously to converge to the data distribution.
1 code implementation • 1 Jun 2018 • Yunfu Song, Zhijian Ou
With these contributions and results, this paper significantly advances the learning and applications of NRFs to a new level, both theoretically and empirically, which have never been obtained before.
no code implementations • ICLR 2018 • Yichi Zhang, Zhijian Ou
An ensemble of neural networks is known to be more robust and accurate than an individual network, however usually with linearly-increased cost in both training and testing.
no code implementations • 9 Nov 2017 • Yinpei Dai, Zhijian Ou, Dawei Ren, Pengfei Yu
The above observations motivate us to enrich current representation of dialog states and collect a brand new dialog dataset about movies, based upon which we build a new DST, called enriched DST (EDST), for flexible accessing movie information.
no code implementations • 30 Oct 2017 • Bin Wang, Zhijian Ou
However, the training efficiency of neural TRF LMs is not satisfactory, which limits the scalability of TRF LMs on large training corpus.
no code implementations • 23 Jul 2017 • Bin Wang, Zhijian Ou
The idea is to use nonlinear potentials with continuous features, implemented by neural networks (NNs), in the TRF framework.
no code implementations • 13 Dec 2016 • Yiyan Wang, Haotian Xu, Zhijian Ou
State-of-the-art i-vector based speaker verification relies on variants of Probabilistic Linear Discriminant Analysis (PLDA) for discriminant analysis.
no code implementations • 30 Mar 2016 • Bin Wang, Zhijian Ou, Yong He, Akinori Kawamura
The dominant language models (LMs) such as n-gram and neural network (NN) models represent sentence probabilities in terms of conditionals.
no code implementations • 20 Mar 2016 • Haotian Xu, Zhijian Ou
Though with progress, model learning and performing posterior inference still remains a common challenge for using deep generative models, especially for handling discrete hidden variables.
no code implementations • 20 Mar 2015 • Jinye Zhang, Zhijian Ou
Existing MAP inference algorithms for determinantal point processes (DPPs) need to calculate determinants or conduct eigenvalue decomposition generally at the scale of the full kernel, which presents a great challenge for real-world applications.