Search Results for author: Zhijian Ou

Found 48 papers, 24 papers with code

Energy-Based Models with Applications to Speech and Language Processing

no code implementations16 Mar 2024 Zhijian Ou

Therefore, the purpose of this monograph is to present a systematic introduction to energy-based models, including both algorithmic progress and applications in speech and language processing.

Language Modelling Natural Language Understanding +3

Prompt Pool based Class-Incremental Continual Learning for Dialog State Tracking

1 code implementation17 Nov 2023 Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng

Inspired by the recently emerging prompt tuning method that performs well on dialog systems, we propose to use the prompt pool method, where we maintain a pool of key-value paired prompts and select prompts from the pool according to the distance between the dialog history and the prompt keys.

Continual Learning dialog state tracking

UniPCM: Universal Pre-trained Conversation Model with Task-aware Automatic Prompt

no code implementations20 Sep 2023 Yucheng Cai, Wentao Ma, Yuchuan Wu, Shuzheng Si, Yuan Shao, Zhijian Ou, Yongbin Li

Using the high-quality prompts generated, we scale the corpus of the pre-trained conversation model to 122 datasets from 15 dialog-related tasks, resulting in Universal Pre-trained Conversation Model (UniPCM), a powerful foundation model for various conversational tasks and different dialog systems.

Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition

1 code implementation22 May 2023 Hong Liu, Zhaobiao Lv, Zhijian Ou, Wenbo Zhao, Qing Xiao

Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs).

Sentence speech-recognition +1

Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision

1 code implementation22 May 2023 Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng

Most existing task-oriented dialog (TOD) systems track dialog states in terms of slots and values and use them to query a database to get relevant knowledge to generate responses.

Question Answering Retrieval

Persistently Trained, Diffusion-assisted Energy-based Models

1 code implementation21 Apr 2023 Xinwei Zhang, Zhiqiang Tan, Zhijian Ou

Maximum likelihood (ML) learning for energy-based models (EBMs) is challenging, partly due to non-convergence of Markov chain Monte Carlo. Several variations of ML learning have been proposed, but existing methods all fail to achieve both post-training image generation and proper density estimation.

Density Estimation Image Generation +1

A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems

1 code implementation17 Oct 2022 Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng

Second, an important ingredient in a US is that the user goal can be effectively incorporated and tracked; but how to flexibly integrate goal state tracking and develop an end-to-end trainable US for multi-domains has remained to be a challenge.

Reinforcement Learning (RL)

Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture

no code implementations13 Oct 2022 Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng

Recently, there has been progress in supervised funetuning pretrained GPT-2 to build end-to-end task-oriented dialog (TOD) systems.

Information Extraction and Human-Robot Dialogue towards Real-life Tasks: A Baseline Study with the MobileCS Dataset

1 code implementation27 Sep 2022 Hong Liu, Hao Peng, Zhijian Ou, Juanzi Li, Yi Huang, Junlan Feng

Recently, there have merged a class of task-oriented dialogue (TOD) datasets collected through Wizard-of-Oz simulated games.

Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning of Discrete Latent Variable Models

1 code implementation SIGDIAL (ACL) 2022 Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng

In this paper, we propose to apply JSA to semi-supervised learning of the latent state TOD models, which is referred to as JSA-TOD.

A Challenge on Semi-Supervised and Reinforced Task-Oriented Dialog Systems

1 code implementation6 Jul 2022 Zhijian Ou, Junlan Feng, Juanzi Li, Yakun Li, Hong Liu, Hao Peng, Yi Huang, Jiangjiang Zhao

A challenge on Semi-Supervised and Reinforced Task-Oriented Dialog Systems, Co-located with EMNLP2022 SereTOD Workshop.

Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems

2 code implementations13 Apr 2022 Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng

Recently, Transformer based pretrained language models (PLMs), such as GPT2 and T5, have been leveraged to build generative task-oriented dialog (TOD) systems.

CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

1 code implementation31 Mar 2022 Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan

The simulation module is jointly trained with the ASR model using a self-supervised loss; the ASR model is optimized with the usual ASR loss, e. g., CTC-CRF as used in our experiments.

Chunking speech-recognition +1

Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study

no code implementations31 Mar 2022 Keyu An, Ji Xiao, Zhijian Ou

In this paper, we systematically compare the performance of three schemes to exploit external single-channel data for multi-channel end-to-end ASR, namely back-end pre-training, data scheduling, and data simulation, under different settings such as the sizes of the single-channel data and the choices of the front-end.

Scheduling speech-recognition +1

An Empirical Study of Language Model Integration for Transducer based Speech Recognition

no code implementations31 Mar 2022 Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan

Based on the DR method, we propose a low-order density ratio method (LODR) by replacing the estimation with a low-order weak language model.

Language Modelling speech-recognition +1

Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning

1 code implementation2 Nov 2021 Wenyu Zhu, Zhiyao Feng, Zihan Zhang, Jianjun Chen, Zhijian Ou, Min Yang, Chao Zhang

Recovering binary programs' call graphs is crucial for inter-procedural analysis tasks and applications based on them. transfer One of the core challenges is recognizing targets of indirect calls (i. e., indirect callees).

Contrastive Learning Question Answering +1

Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems

2 code implementations9 Sep 2021 Hong Liu, Yucheng Cai, Zhenru Lin, Zhijian Ou, Yi Huang, Junlan Feng

In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches.

Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings

1 code implementation11 Jul 2021 Chengrui Zhu, Keyu An, Huahuan Zheng, Zhijian Ou

The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages.

speech-recognition Speech Recognition

Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers

1 code implementation7 Jul 2021 Huahuan Zheng, Wenjie Peng, Zhijian Ou, Jinsong Zhang

Automatic speech recognition systems have been largely improved in the past few decades and current systems are mainly hybrid-based and end-to-end-based.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Exploiting Single-Channel Speech For Multi-channel End-to-end Speech Recognition

no code implementations6 Jul 2021 Keyu An, Zhijian Ou

Recently, the end-to-end training approach for neural beamformer-supported multi-channel ASR has shown its effectiveness in multi-channel speech recognition.

Data Augmentation Scheduling +2

Deformable TDNN with adaptive receptive fields for speech recognition

no code implementations30 Apr 2021 Keyu An, Yi Zhang, Zhijian Ou

Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems.

speech-recognition Speech Recognition

The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

no code implementations13 Nov 2020 Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao

Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.

Sound Audio and Speech Processing

An empirical study of domain-agnostic semi-supervised learning via energy-based models: joint-training and pre-training

no code implementations25 Oct 2020 Yunfu Song, Huahuan Zheng, Zhijian Ou

In contrast, generative SSL methods involve unsupervised learning based on generative models by either joint-training or pre-training, and are more appealing from the perspective of being domain-agnostic, since they do not inherently require data augmentations.

Image Classification

A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning

1 code implementation EMNLP 2020 Yichi Zhang, Zhijian Ou, Huixin Wang, Junlan Feng

In this paper we aim at alleviating the reliance on belief state labels in building end-to-end dialog systems, by leveraging unlabeled dialog data towards semi-supervised learning.

End-To-End Dialogue Modelling

Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models

1 code implementation28 May 2020 Zhijian Ou, Yunfu Song

Although with progress in introducing auxiliary amortized inference models, learning discrete latent variable models is still challenging.

Structured Prediction

Paraphrase Augmented Task-Oriented Dialog Generation

1 code implementation ACL 2020 Silin Gao, Yichi Zhang, Zhijian Ou, Zhou Yu

Neural generative models have achieved promising performance on dialog generation tasks if given a huge data set.

Data Augmentation Response Generation

Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models

no code implementations14 Feb 2020 Silin Gao, Zhijian Ou, Wei Yang, Huifang Xu

There has been a long recognition that discrete features (n-gram features) and neural network based features have complementary strengths for language models (LMs).

speech-recognition Speech Recognition

CAT: CRF-based ASR Toolkit

2 code implementations20 Nov 2019 Keyu An, Hongyu Xiang, Zhijian Ou

In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

CRF-based Single-stage Acoustic Modeling with CTC Topology

1 code implementation16 Apr 2019 Hongyu Xiang, Zhijian Ou

CTC-CRF is conceptually simple, which basically implements a CRF layer on top of features generated by the bottom neural network with the special state topology.

Benchmarking Speech Recognition

Neural CRF transducers for sequence labeling

no code implementations4 Nov 2018 Kai Hu, Zhijian Ou, Min Hu, Junlan Feng

Conditional random fields (CRFs) have been shown to be one of the most successful approaches to sequence labeling.

Chunking NER +2

Elastic CRFs for Open-ontology Slot Filling

no code implementations4 Nov 2018 Yinpei Dai, Yichi Zhang, Zhijian Ou, Yanmeng Wang, Junlan Feng

Second, the one-hot encoding of slot labels ignores the semantic meanings and relations for slots, which are implicit in their natural language descriptions.

slot-filling Slot Filling

Learning Neural Random Fields with Inclusive Auxiliary Generators

no code implementations27 Sep 2018 Yunfu Song, Zhijian Ou

Neural random fields (NRFs), which are defined by using neural networks to implement potential functions in undirected models, provide an interesting family of model spaces for machine learning.

Image Generation

A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling

no code implementations5 Aug 2018 Zhijian Ou

This document aims to provide a review on learning with deep generative models (DGMs), which is an highly-active area in machine learning and more generally, artificial intelligence.

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

no code implementations13 Jul 2018 Zhangyu Xiao, Zhijian Ou, Wei Chu, Hui Lin

In this paper, we present an end-to-end automatic speech recognition system, which successfully employs subword units in a hybrid CTC-Attention based system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation

1 code implementation3 Jul 2018 Bin Wang, Zhijian Ou

First, a dynamic noise distribution is introduced and trained simultaneously to converge to the data distribution.

Language Modelling Sentence

Generative Modeling by Inclusive Neural Random Fields with Applications in Image Generation and Anomaly Detection

1 code implementation1 Jun 2018 Yunfu Song, Zhijian Ou

With these contributions and results, this paper significantly advances the learning and applications of NRFs to a new level, both theoretically and empirically, which have never been obtained before.

Anomaly Detection Image Generation

Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning

no code implementations ICLR 2018 Yichi Zhang, Zhijian Ou

An ensemble of neural networks is known to be more robust and accurate than an individual network, however usually with linearly-increased cost in both training and testing.

Language Modelling Network Pruning

Tracking of enriched dialog states for flexible conversational information access

no code implementations9 Nov 2017 Yinpei Dai, Zhijian Ou, Dawei Ren, Pengfei Yu

The above observations motivate us to enrich current representation of dialog states and collect a brand new dialog dataset about movies, based upon which we build a new DST, called enriched DST (EDST), for flexible accessing movie information.

dialog state tracking slot-filling +1

Learning neural trans-dimensional random field language models with noise-contrastive estimation

no code implementations30 Oct 2017 Bin Wang, Zhijian Ou

However, the training efficiency of neural TRF LMs is not satisfactory, which limits the scalability of TRF LMs on large training corpus.

speech-recognition Speech Recognition

Language modeling with Neural trans-dimensional random fields

no code implementations23 Jul 2017 Bin Wang, Zhijian Ou

The idea is to use nonlinear potentials with continuous features, implemented by neural networks (NNs), in the TRF framework.

Language Modelling speech-recognition +1

Joint Bayesian Gaussian discriminant analysis for speaker verification

no code implementations13 Dec 2016 Yiyan Wang, Haotian Xu, Zhijian Ou

State-of-the-art i-vector based speaker verification relies on variants of Probabilistic Linear Discriminant Analysis (PLDA) for discriminant analysis.

Face Verification Speaker Verification

Model Interpolation with Trans-dimensional Random Field Language Models for Speech Recognition

no code implementations30 Mar 2016 Bin Wang, Zhijian Ou, Yong He, Akinori Kawamura

The dominant language models (LMs) such as n-gram and neural network (NN) models represent sentence probabilities in terms of conditionals.

Sentence speech-recognition +1

Joint Stochastic Approximation learning of Helmholtz Machines

no code implementations20 Mar 2016 Haotian Xu, Zhijian Ou

Though with progress, model learning and performing posterior inference still remains a common challenge for using deep generative models, especially for handling discrete hidden variables.

Block-Wise MAP Inference for Determinantal Point Processes with Application to Change-Point Detection

no code implementations20 Mar 2015 Jinye Zhang, Zhijian Ou

Existing MAP inference algorithms for determinantal point processes (DPPs) need to calculate determinants or conduct eigenvalue decomposition generally at the scale of the full kernel, which presents a great challenge for real-world applications.

Change Point Detection Point Processes

Cannot find the paper you are looking for? You can Submit a new open access paper.