Search Results for author: Kai Yu

Found 98 papers, 33 papers with code

CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset

1 code implementation25 May 2023 Hanchong Zhang, Jieyu Li, Lu Chen, Ruisheng Cao, Yunyan Zhang, Yu Huang, Yefeng Zheng, Kai Yu

Furthermore, we present CSS, a large-scale CrosS-Schema Chinese text-to-SQL dataset, to carry on corresponding studies.

Benchmarking Text-To-SQL

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

1 code implementation19 May 2023 Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue

Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks.

3D Part Segmentation Few-Shot 3D Point Cloud Classification +1

Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction

1 code implementation14 May 2023 Danyang Zhang, Lu Chen, Kai Yu

To help the research of InfoUI interaction, a novel platform Mobile-Env is presented in this paper.

Language Modelling

DiffVoice: Text-to-Speech with Latent Diffusion

no code implementations23 Apr 2023 Zhijun Liu, Yiwei Guo, Kai Yu

In this work, we present DiffVoice, a novel text-to-speech model based on latent diffusion.

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

no code implementations30 Mar 2023 Chenpng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian

Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly.

Talking Face Generation

TrFedDis: Trusted Federated Disentangling Network for Non-IID Domain Feature

1 code implementation30 Jan 2023 Meng Wang, Kai Yu, Chun-Mei Feng, Yiming Qian, Ke Zou, Lianyu Wang, Rick Siow Mong Goh, Xinxing Xu, Yong liu, Huazhu Fu

To the best of our knowledge, our proposed TrFedDis is the first work to develop an FL approach based on evidential uncertainty combined with feature disentangling, which enhances the performance and reliability of FL in non-IID domain features.

Federated Learning

On the Structural Generalization in Text-to-SQL

no code implementations12 Jan 2023 Jieyu Li, Lu Chen, Ruisheng Cao, Su Zhu, Hongshen Xu, Zhi Chen, Hanchong Zhang, Kai Yu

Exploring the generalization of a text-to-SQL parser is essential for a system to automatically adapt the real-world databases.

Text-To-SQL

Spectral Efficiency Analysis of Uplink-Downlink Decoupled Access in C-V2X Networks

1 code implementation5 Dec 2022 Luofang Jiao, Kai Yu, Yunting Xu, Tianqi Zhang, Haibo Zhou, Xuemin, Shen

The uplink (UL)/downlink (DL) decoupled access has been emerging as a novel access architecture to improve the performance gains in cellular networks.

Spectral Efficiency Analysis of Uplink-Downlink Decoupled Access in C-V2X Networks

Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images

1 code implementation1 Dec 2022 Meng Wang, Kai Yu, Chun-Mei Feng, Ke Zou, Yanyu Xu, Qingquan Meng, Rick Siow Mong Goh, Yong liu, Xinxing Xu, Huazhu Fu

Specifically, aiming at improving the model's ability to learn the complex pathological features of retinal edema lesions in OCT images, we develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module of our newly designed.

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

no code implementations17 Nov 2022 Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu

Specifically, instead of being guided with a one-hot vector for the specified emotion, EmoDiff is guided with a soft label where the value of the specified emotion and \textit{Neutral} is set to $\alpha$ and $1-\alpha$ respectively.

Denoising

BER: Balanced Error Rate For Speaker Diarization

2 code implementations8 Nov 2022 Tao Liu, Kai Yu

DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones.

speaker-diarization Speaker Diarization

D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat

no code implementations24 May 2022 Binwei Yao, Chao Shi, Likai Zou, Lingfeng Dai, Mengyue Wu, Lu Chen, Zhen Wang, Kai Yu

In a depression-diagnosis-directed clinical session, doctors initiate a conversation with ample emotional support that guides the patients to expose their symptoms based on clinical diagnosis criteria.

Response Generation

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

no code implementations23 May 2022 Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, Kai Yu

However, this API-based architecture greatly limits the information-searching capability of intelligent assistants and may even lead to task failure if TOD-specific APIs are not available or the task is too complicated to be executed by the provided APIs.

Scheduling

Climate and Weather: Inspecting Depression Detection via Emotion Recognition

no code implementations29 Apr 2022 Wen Wu, Mengyue Wu, Kai Yu

Automatic depression detection has attracted increasing amount of attention but remains a challenging task.

Depression Detection Emotion Recognition

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature

no code implementations2 Apr 2022 Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu

The mainstream neural text-to-speech(TTS) pipeline is a cascade system, including an acoustic model(AM) that predicts acoustic feature from the input transcript and a vocoder that generates waveform according to the given acoustic feature.

Speech Synthesis Text-To-Speech Synthesis

Audio-text Retrieval in Context

no code implementations25 Mar 2022 Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu

Using pre-trained audio features and a descriptor-based aggregation method, we build our contextual audio-text retrieval system.

Retrieval Text Retrieval

Unsupervised word-level prosody tagging for controllable speech synthesis

no code implementations15 Feb 2022 Yiwei Guo, Chenpeng Du, Kai Yu

Although word-level prosody modeling in neural text-to-speech (TTS) has been investigated in recent research for diverse speech synthesis, it is still challenging to control speech synthesis manually without a specific reference.

Speech Synthesis

Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF

no code implementations9 Dec 2021 Su Zhu, Lu Chen, Ruisheng Cao, Zhi Chen, Qingliang Miao, Kai Yu

In this paper, we propose to improve prototypical networks with vector projection distance and abstract triangular Conditional Random Field (CRF) for the few-shot NLU.

intent-classification Intent Classification +5

Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

1 code implementation3 Sep 2021 Chun-Mei Feng, Yunlu Yan, Kai Yu, Yong Xu, Ling Shao, Huazhu Fu

Our SANet could explore the areas of high-intensity and low-intensity regions in the "forward" and "reverse" directions with the help of the auxiliary contrast, while learning clearer anatomical structure and edge information for the SR of a target-contrast MR image.

Image Super-Resolution

THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING

1 code implementation DCASE Challenge 2021 Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu

This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.

Ranked #2 on Audio captioning on Clotho (using extra training data)

Audio captioning Audio Tagging +2

ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser

no code implementations NAACL 2021 Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zihan Xu, Su Zhu, Kai Yu

Given a database schema, Text-to-SQL aims to translate a natural language question into the corresponding SQL query.

Semantic Parsing Text-To-SQL

Quantum Dimensionality Reduction by Linear Discriminant Analysis

no code implementations4 Mar 2021 Kai Yu, Gong-De Guo, Song Lin

In this paper, we present a quantum algorithm and a quantum circuit to efficiently perform linear discriminant analysis (LDA) for dimensionality reduction.

Dimensionality Reduction Quantum Physics

LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching

1 code implementation25 Feb 2021 Boer Lyu, Lu Chen, Su Zhu, Kai Yu

Additionally, we adopt the word lattice graph as input to maintain multi-granularity information.

Text Matching

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

2 code implementations1 Feb 2021 Chenpeng Du, Kai Yu

Generating natural speech with diverse and smooth prosody pattern is a challenging task.

Speech Synthesis Text-To-Speech Synthesis Sound

Towards duration robust weakly supervised sound event detection

1 code implementation19 Jan 2021 Heinrich Dinkel, Mengyue Wu, Kai Yu

Our model outperforms other approaches on the DCASE2018 and URBAN-SED datasets without requiring prior duration knowledge.

Data Augmentation Sound Event Detection Sound Audio and Speech Processing

A relic sketch extraction framework based on detail-aware hierarchical deep network

no code implementations17 Jan 2021 Jinye Peng, Jiaxin Wang, Jun Wang, Erlei Zhang, Qunxi Zhang, Yongqin Zhang, Xianlin Peng, Kai Yu

For the fine extraction stage, we design a new multiscale U-Net (MSU-Net) to effectively remove disease noise and refine the sketch.

Edge Detection Transfer Learning

A 3D Non-stationary MmWave Channel Model for Vacuum Tube Ultra-High-Speed Train Channels

no code implementations17 Jan 2021 YingJie Xu, Kai Yu, Li Li, Xianfu Lei, Li Hao, Cheng-Xiang Wang

As a potential development direction of future transportation, the vacuum tube ultra-high-speed train (UHST) wireless communication systems have newly different channel characteristics from existing high-speed train (HST) scenarios.

An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models

no code implementations14 Oct 2020 Zihan Zhao, Yuncong Liu, Lu Chen, Qi Liu, Rao Ma, Kai Yu

Recently, pre-trained language models like BERT have shown promising performance on multiple natural language processing tasks.

Quantization

CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking

no code implementations22 Sep 2020 Zhi Chen, Lu Chen, Zihan Xu, Yanbin Zhao, Su Zhu, Kai Yu

In dialogue systems, a dialogue state tracker aims to accurately find a compact representation of the current dialogue status, based on the entire dialogue history.

Dialogue State Tracking

Dual Learning for Dialogue State Tracking

no code implementations22 Sep 2020 Zhi Chen, Lu Chen, Yanbin Zhao, Su Zhu, Kai Yu

In task-oriented multi-turn dialogue systems, dialogue state refers to a compact representation of the user goal in the context of dialogue history.

Dialogue State Tracking

Structured Hierarchical Dialogue Policy with Graph Neural Networks

no code implementations22 Sep 2020 Zhi Chen, Xiaoyuan Liu, Lu Chen, Kai Yu

A novel ComNet is proposed to model the structure of a hierarchical agent.

Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management

no code implementations22 Sep 2020 Zhi Chen, Lu Chen, Xiaoyuan Liu, Kai Yu

The task-oriented spoken dialogue system (SDS) aims to assist a human user in accomplishing a specific task (e. g., hotel booking).

Decision Making Dialogue Management +3

Deep Reinforcement Learning for On-line Dialogue State Tracking

no code implementations22 Sep 2020 Zhi Chen, Lu Chen, Xiang Zhou, Kai Yu

To the best of our knowledge, this is the first effort to optimize the DST module within DRL framework for on-line task-oriented spoken dialogue systems.

Dialogue Management Dialogue State Tracking +4

Vector Projection Network for Few-shot Slot Tagging in Natural Language Understanding

1 code implementation21 Sep 2020 Su Zhu, Ruisheng Cao, Lu Chen, Kai Yu

Few-shot slot tagging becomes appealing for rapid domain transfer and adaptation, motivated by the tremendous development of conversational dialogue systems.

Few-Shot Learning Natural Language Understanding +2

An Investigation on Deep Learning with Beta Stabilizer

no code implementations31 Jul 2020 Qi Liu, Tian Tan, Kai Yu

It is concluded that beta stabilizer parameters can reduce the sensitivity of learning rate with almost the same performance on DNN with relu activation function and LSTM.

Handwriting Recognition speech-recognition +1

Future Vector Enhanced LSTM Language Model for LVCSR

no code implementations31 Jul 2020 Qi Liu, Yanmin Qian, Kai Yu

For the speech recognition rescoring, although the proposed LSTM LM obtains very slight gains, the new model seems obtain the great complementary with the conventional LSTM LM.

Language Modelling speech-recognition +1

Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding

1 code implementation24 May 2020 Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen, Kai Yu

In this paper, a novel BERT based SLU model (WCN-BERT SLU) is proposed to encode WCNs and the dialogue context jointly.

Spoken Language Understanding

Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders

no code implementations30 Apr 2020 Yanbin Zhao, Lu Chen, Zhi Chen, Kai Yu

When modeling simple and complex sentences with autoencoders, we introduce different types of noise into the training process.

Denoising Language Modelling +4

Dual Learning for Semi-Supervised Natural Language Understanding

2 code implementations26 Apr 2020 Su Zhu, Ruisheng Cao, Kai Yu

The framework is composed of dual pseudo-labeling and dual learning method, which enables an NLU model to make full use of data (labeled and unlabeled) through a closed-loop of the primal and dual tasks.

Natural Language Understanding

Voice activity detection in the wild via weakly supervised sound event detection

1 code implementation27 Mar 2020 Heinrich Dinkel, Yefei Chen, Mengyue Wu, Kai Yu

We proposed two GPVAD models, one full (GPV-F), trained on 527 Audioset sound events, and one binary (GPV-B), only distinguishing speech and noise.

Sound Audio and Speech Processing

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

no code implementations18 Jun 2019 Xu Xiang, Shuai Wang, Houjun Huang, Yanmin Qian, Kai Yu

The proposed approach can achieve the state-of-the-art performance, with 25% ~ 30% equal error rate (EER) reduction on both tasks when compared to strong baselines using cross entropy loss with softmax, obtaining 2. 238% EER on VoxCeleb1 test set and 2. 761% EER on SITW core-core test set, respectively.

Speaker Recognition

Audio Caption in a Car Setting with a Sentence-Level Loss

1 code implementation31 May 2019 Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu

Captioning has attracted much attention in image and video understanding while a small amount of work examines audio captioning.

Audio captioning Semantic Similarity +4

AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning

no code implementations27 May 2019 Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Milica Gasic, Kai Yu

Experiments show that AgentGraph models significantly outperform traditional reinforcement learning approaches on most of the 18 tasks of the PyDial benchmark.

Dialogue Management Management +4

A Hierarchical Decoding Model For Spoken Language Understanding From Unaligned Data

1 code implementation9 Apr 2019 Zijian Zhao, Su Zhu, Kai Yu

In the paper, we focus on spoken language understanding from unaligned data whose annotation is a set of act-slot-value triples.

Spoken Language Understanding

Duration robust sound event detection

1 code implementation8 Apr 2019 Heinrich Dinkel, Kai Yu

Task 4 of the Dcase2018 challenge demonstrated that substantially more research is needed for a real-world application of sound event detection.

Sound Audio and Speech Processing

Text-based depression detection on sparse data

1 code implementation8 Apr 2019 Heinrich Dinkel, Mengyue Wu, Kai Yu

Previous text-based depression detection is commonly based on large user-generated data.

Depression Detection Word Embeddings

Audio Caption: Listen and Tell

1 code implementation25 Feb 2019 Mengyue Wu, Heinrich Dinkel, Kai Yu

A baseline encoder-decoder model is provided for both English and Mandarin.

General Classification

End-to-End Monaural Multi-speaker ASR System without Pretraining

no code implementations5 Nov 2018 Xuankai Chang, Yanmin Qian, Kai Yu, Shinji Watanabe

The experiments demonstrate that the proposed methods can improve the performance of the end-to-end model in separating the overlapping speech and recognizing the separated streams.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting

no code implementations2 Aug 2018 Zhehuai Chen, Yanmin Qian, Kai Yu

The few studies on sequence discriminative training for KWS are limited for fixed vocabulary or LVCSR based methods and have not been compared to the state-of-the-art deep learning based KWS approaches.

Keyword Spotting speech-recognition +1

Structured Dialogue Policy with Graph Neural Networks

no code implementations COLING 2018 Lu Chen, Bowen Tan, Sishan Long, Kai Yu

The proposed structured deep reinforcement learning is based on graph neural networks (GNN), which consists of some sub-networks, each one for a node on a directed graph.

Automatic Speech Recognition (ASR) Decision Making +5

Binarized LSTM Language Model

no code implementations NAACL 2018 Xuan Liu, Di Cao, Kai Yu

Although excellent performance is obtained for large vocabulary tasks, tremendous memory consumption prohibits the use of LSTM LM in low-resource devices.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

On Modular Training of Neural Acoustics-to-Word Model for LVCSR

no code implementations3 Mar 2018 Zhehuai Chen, Qi Liu, Hao Li, Kai Yu

Finally, modules are integrated into an acousticsto-word model (A2W) and jointly optimized using acoustic data to retain the advantage of sequence modeling.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Affordable On-line Dialogue Policy Learning

no code implementations EMNLP 2017 Cheng Chang, Runzhe Yang, Lu Chen, Xiang Zhou, Kai Yu

The key to building an evolvable dialogue system in real-world scenarios is to ensure an affordable on-line dialogue policy learning, which requires the on-line learning process to be safe, efficient and economical.

Dialogue Management

Concept Transfer Learning for Adaptive Language Understanding

no code implementations WS 2018 Su Zhu, Kai Yu

Concept definition is important in language understanding (LU) adaptation since literal definition difference can easily lead to data sparsity even if different data sets are actually semantically correlated.

Domain Adaptation Transfer Learning

On-line Dialogue Policy Learning with Companion Teaching

no code implementations EACL 2017 Lu Chen, Runzhe Yang, Cheng Chang, Zihao Ye, Xiang Zhou, Kai Yu

On-line dialogue policy learning is the key for building evolvable conversational agent in real world scenarios.

Dialogue Management

A Large-scale Distributed Video Parsing and Evaluation Platform

no code implementations29 Nov 2016 Kai Yu, Yang Zhou, Da Li, Zhang Zhang, Kaiqi Huang

Visual surveillance systems have become one of the largest data sources of Big Visual Data in real world.

Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization

no code implementations17 Nov 2016 Kai Yu, Biao Leng, Zhang Zhang, Dangwei Li, Kaiqi Huang

Based on GoogLeNet, firstly, a set of mid-level attribute features are discovered by novelly designed detection layers, where a max-pooling based weakly-supervised object detection technique is used to train these layers with only image-level labels without the need of bounding box annotations of pedestrian attributes.

Multi-Label Image Classification object-detection +3

Encoder-decoder with Focus-mechanism for Sequence Labelling Based Spoken Language Understanding

no code implementations6 Aug 2016 Su Zhu, Kai Yu

This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding.

speech-recognition Speech Recognition +1

Text Flow: A Unified Text Detection System in Natural Scene Images

no code implementations ICCV 2015 Shangxuan Tian, Yifeng Pan, Chang Huang, Shijian Lu, Kai Yu, Chew Lim Tan

With character candidates detected by cascade boosting, the min-cost flow network model integrates the last three sequential steps into a single process which solves the error accumulation problem at both character level and text line level effectively.

Scene Text Detection Text-Line Extraction

On Training Bi-directional Neural Network Language Model with Noise Contrastive Estimation

1 code implementation19 Feb 2016 Tianxing He, Yu Zhang, Jasha Droppo, Kai Yu

We propose to train bi-directional neural network language model(NNLM) with noise contrastive estimation(NCE).

Language Modelling

Recurrent Polynomial Network for Dialogue State Tracking

no code implementations14 Jul 2015 Kai Sun, Qizhe Xie, Kai Yu

Dialogue state tracking (DST) is a process to estimate the distribution of the dialogue states as a dialogue progresses.

dialog state tracking Dialogue State Tracking

Deep Multiple Instance Learning for Image Classification and Auto-Annotation

no code implementations CVPR 2015 Jiajun Wu, Yinan Yu, Chang Huang, Kai Yu

The recent development in learning deep representations has demonstrated its wide applications in traditional vision tasks like classification and detection.

Classification General Classification +3

Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval

1 code implementation25 Dec 2012 Chang Huang, Shenghuo Zhu, Kai Yu

Learning Mahanalobis distance metrics in a high- dimensional feature space is very difficult especially when structural sparsity and low rank are enforced to improve com- putational efficiency in testing phase.

Face Verification Metric Learning +1

Deep Coding Network

no code implementations NeurIPS 2010 Yuanqing Lin, Tong Zhang, Shenghuo Zhu, Kai Yu

This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding.

Nonlinear Learning using Local Coordinate Coding

no code implementations NeurIPS 2009 Kai Yu, Tong Zhang, Yihong Gong

This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning.

Deep Learning with Kernel Regularization for Visual Recognition

no code implementations NeurIPS 2008 Kai Yu, Wei Xu, Yihong Gong

In this paper we focus on training deep neural networks for visual recognition tasks.

Stochastic Relational Models for Large-scale Dyadic Data using MCMC

no code implementations NeurIPS 2008 Shenghuo Zhu, Kai Yu, Yihong Gong

Stochastic relational models provide a rich family of choices for learning and predicting dyadic data between two sets of entities.

Bayesian Inference Collaborative Filtering

Gaussian Process Models for Link Analysis and Transfer Learning

no code implementations NeurIPS 2007 Kai Yu, Wei Chu

In this paper we develop a Gaussian process (GP) framework to model a collection of reciprocal random variables defined on the \emph{edges} of a network.

Link Prediction Transfer Learning

Predictive Matrix-Variate t Models

no code implementations NeurIPS 2007 Shenghuo Zhu, Kai Yu, Yihong Gong

It is becoming increasingly important to learn from a partially-observed random matrix and predict its missing elements.

Missing Elements Model Selection

Cannot find the paper you are looking for? You can Submit a new open access paper.