Search Results for author: Kaitao Song

Found 39 papers, 21 papers with code

EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

1 code implementation20 Jun 2024 Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Dongsheng Li, Deqing Yang

In this paper, we introduce EvoAgent, a generic method to automatically extend expert agents to multi-agent systems via the evolutionary algorithm, thereby improving the effectiveness of LLM-based agents in solving tasks.

Evolutionary Algorithms

TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models

no code implementations7 Jun 2024 Ping Yu, Kaitao Song, Fengchen He, Ming Chen, Jianfeng Lu

Moreover, we also analyze the robustness of current LLMs in solving TCM QA tasks by introducing randomness.

Question Answering

Can Graph Learning Improve Task Planning?

no code implementations29 May 2024 Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, Dongsheng Li

Task planning is emerging as an important research topic alongside the development of large language models (LLMs).

Decision Making Graph Learning +2

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

no code implementations5 Mar 2024 Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt.

Quantization Speech Synthesis

EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model

no code implementations11 Jan 2024 Yuqi Chen, Kan Ren, Kaitao Song, Yansen Wang, Yifan Wang, Dongsheng Li, Lili Qiu

Self-supervised learning has emerged as a highly effective approach in the fields of natural language processing and computer vision.

Anomaly Detection EEG +2

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction

1 code implementation11 Jan 2024 Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Ren Kan, Dongsheng Li, Deqing Yang

EasyTool purifies essential information from extensive tool documentation of different sources, and elaborates a unified interface (i. e., tool instruction) to offer standardized tool descriptions and functionalities for LLM-based agents.

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

1 code implementation18 Oct 2023 Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks.

AI Agent Music Classification

Learning To Teach Large Language Models Logical Reasoning

1 code implementation13 Oct 2023 Meiqi Chen, Yubo Ma, Kaitao Song, Yixin Cao, Yan Zhang, Dongsheng Li

Large language models (LLMs) have gained enormous attention from both academia and industry, due to their exceptional ability in language generation and extremely powerful generalization.

counterfactual Event Relation Extraction +4

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

1 code implementation15 Sep 2023 Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang

Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort.

Evolutionary Algorithms

PromptTTS 2: Describing and Generating Voices with Text Prompt

no code implementations5 Sep 2023 Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech.

Language Modelling Large Language Model

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

no code implementations5 Jun 2023 Yukang Liang, Kaitao Song, Shaoguang Mao, Huiqiang Jiang, Luna Qiu, Yuqing Yang, Dongsheng Li, Linli Xu, Lili Qiu

Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level.

Deliberate then Generate: Enhanced Prompting Framework for Text Generation

no code implementations31 May 2023 Bei Li, Rui Wang, Junliang Guo, Kaitao Song, Xu Tan, Hany Hassan, Arul Menezes, Tong Xiao, Jiang Bian, Jingbo Zhu

Large language models (LLMs) have shown remarkable success across a wide range of natural language generation tasks, where proper prompt designs make great impacts.

Text Generation

DiffusionNER: Boundary Diffusion for Named Entity Recognition

2 code implementations22 May 2023 Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang

In this paper, we propose DiffusionNER, which formulates the named entity recognition task as a boundary-denoising diffusion process and thus generates named entities from noisy spans.

Chinese Named Entity Recognition Denoising +4

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

1 code implementation NeurIPS 2023 Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang

Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence.

Philosophy

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition

1 code implementation14 Mar 2023 Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li, Xunying Liu, Helen Meng

Experimental results based on the ACII Challenge 2022 dataset demonstrate the superior performance of the proposed system and the effectiveness of considering multiple relationships using hierarchical regression chain models.

A-VB Culture A-VB High +6

Learning Domain Invariant Prompt for Vision-Language Models

1 code implementation8 Dec 2022 Cairong Zhao, Yubin Wang, Xinyang Jiang, Yifei Shen, Kaitao Song, Dongsheng Li, Duoqian Miao

Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples.

Domain Generalization Language Modelling +2

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

1 code implementation2 Dec 2022 Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu

In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Towards Understanding Omission in Dialogue Summarization

1 code implementation14 Nov 2022 Yicheng Zou, Kaitao Song, Xu Tan, Zhongkai Fu, Qi Zhang, Dongsheng Li, Tao Gui

By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization.

Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech

no code implementations10 Aug 2022 Kaitao Song, Teng Wan, Bixia Wang, Huiqiang Jiang, Luna Qiu, Jiahang Xu, Liping Jiang, Qun Lou, Yuqing Yang, Dongsheng Li, Xudong Wang, Lili Qiu

Specifically, we first pre-train an encoder-decoder framework in an automatic speech recognition (ASR) objective by using speech-to-text dataset, and then fine-tune ASR encoder on the cleft palate dataset for hypernasality estimation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

1 code implementation25 May 2022 Kaitao Song, Yichong Leng, Xu Tan, Yicheng Zou, Tao Qin, Dongsheng Li

Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost.

Causal Language Modeling Language Modelling +2

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

no code implementations31 Mar 2022 Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao

However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input.

A study on the efficacy of model pre-training in developing neural text-to-speech system

no code implementations8 Oct 2021 Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data.

Computational Efficiency

Analyzing and Mitigating Interference in Neural Architecture Search

no code implementations29 Aug 2021 Jin Xu, Xu Tan, Kaitao Song, Renqian Luo, Yichong Leng, Tao Qin, Tie-Yan Liu, Jian Li

In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe: 1) the interference on a shared operator between two child models is positively correlated with the number of different operators; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar.

Neural Architecture Search Reading Comprehension

DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling

1 code implementation ACL 2021 Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu

In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms.

Language Modelling

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

10 code implementations ICCV 2021 Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.

Image Classification Instance Segmentation +3

Task-Agnostic and Adaptive-Size BERT Compression

no code implementations1 Jan 2021 Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Li Jian, Tao Qin, Tie-Yan Liu

NAS-BERT trains a big supernet on a carefully designed search space containing various architectures and outputs multiple compressed models with adaptive sizes and latency.

Language Modelling Model Compression +1

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

1 code implementation9 Dec 2020 Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin

Automatic song writing aims to compose a song (lyric and/or melody) by machine, which is an interesting topic in both academia and industry.

Sentence

Neural Machine Translation with Error Correction

1 code implementation21 Jul 2020 Kaitao Song, Xu Tan, Jianfeng Lu

Neural machine translation (NMT) generates the next target token given as input the previous ground truth target tokens during training while the previous generated target tokens during inference, which causes discrepancy between training and inference as well as error propagation, and affects the translation accuracy.

Decoder Machine Translation +2

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning

no code implementations27 Apr 2020 Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu

While pre-training and fine-tuning, e. g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage.

Knowledge Distillation Language Modelling

MPNet: Masked and Permuted Pre-training for Language Understanding

6 code implementations NeurIPS 2020 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.

Ranked #16 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Language Modelling Masked Language Modeling +3

MASS: Masked Sequence to Sequence Pre-training for Language Generation

7 code implementations7 May 2019 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

Pre-training and fine-tuning, e. g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks.

Conversational Response Generation Decoder +6

Generating Adversarial Examples With Conditional Generative Adversarial Net

no code implementations18 Mar 2019 Ping Yu, Kaitao Song, Jianfeng Lu

Recently, deep neural networks have significant progress and successful application in various fields, but they are found vulnerable to attack instances, e. g., adversarial examples.

Generative Adversarial Network

Hybrid Self-Attention Network for Machine Translation

no code implementations1 Nov 2018 Kaitao Song, Xu Tan, Furong Peng, Jianfeng Lu

The encoder-decoder is the typical framework for Neural Machine Translation (NMT), and different structures have been developed for improving the translation performance.

Decoder Machine Translation +2

Double Path Networks for Sequence to Sequence Learning

1 code implementation COLING 2018 Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, Tie-Yan Liu

In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion.

Decoder

Cannot find the paper you are looking for? You can Submit a new open access paper.