Search Results for author: Kaitao Song

Found 36 papers, 20 papers with code

MPNet: Masked and Permuted Pre-training for Language Understanding

6 code implementations • NeurIPS 2020 • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.

Ranked #16 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Language Modelling Masked Language Modeling +3

124,984

Paper
Code

PVT v2: Improved Baselines with Pyramid Vision Transformer

16 code implementations • 25 Jun 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

We hope this work will facilitate state-of-the-art Transformer researches in computer vision.

Ranked #23 on Object Detection on COCO-O

Image Classification Object Detection +1

29,758

Paper
Code

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

9 code implementations • ICCV 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.

Ranked #5 on Semantic Segmentation on SynPASS

Image Classification Instance Segmentation +3

27,790

Paper
Code

TaskBench: Benchmarking Large Language Models for Task Automation

1 code implementation • 30 Nov 2023 • Yongliang Shen, Kaitao Song, Xu Tan, Wenqi Zhang, Kan Ren, Siyu Yuan, Weiming Lu, Dongsheng Li, Yueting Zhuang

To this end, we introduce TaskBench to evaluate the capability of LLMs in task automation.

Benchmarking Parameter Prediction

23,035

Paper
Code

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

1 code implementation • NeurIPS 2023 • Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang

Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence.

Philosophy

23,035

Paper
Code

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction

1 code implementation • 11 Jan 2024 • Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Ren Kan, Dongsheng Li, Deqing Yang

EasyTool purifies essential information from extensive tool documentation of different sources, and elaborates a unified interface (i. e., tool instruction) to offer standardized tool descriptions and functionalities for LLM-based agents.

23,035

Paper
Code

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

1 code implementation • 18 Oct 2023 • Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks.

Music Classification

4,187

Paper
Code

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

1 code implementation • 9 Dec 2020 • Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin

Automatic song writing aims to compose a song (lyric and/or melody) by machine, which is an interesting topic in both academia and industry.

Sentence

4,186

Paper
Code

DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling

1 code implementation • ACL 2021 • Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu

In this paper, we develop DeepRapper, a Transformer-based rap generation system that can model both rhymes and rhythms.

Language Modelling

4,186

Paper
Code

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

1 code implementation • 2 Dec 2022 • Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu

In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

1,286

Paper
Code

MASS: Masked Sequence to Sequence Pre-training for Language Generation

7 code implementations • 7 May 2019 • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu

Pre-training and fine-tuning, e. g., BERT, have achieved great success in language understanding by transferring knowledge from rich-resource pre-training task to the low/zero-resource downstream tasks.

Ranked #2 on Unsupervised Machine Translation on WMT2014 English-French

Conversational Response Generation Response Generation +5

1,115

Paper
Code

DiffusionNER: Boundary Diffusion for Named Entity Recognition

2 code implementations • 22 May 2023 • Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang

In this paper, we propose DiffusionNER, which formulates the named entity recognition task as a boundary-denoising diffusion process and thus generates named entities from noisy spans.

Ranked #2 on Nested Named Entity Recognition on GENIA

Chinese Named Entity Recognition Denoising +4

Paper
Code

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

1 code implementation • 15 Sep 2023 • Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang

Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort.

Evolutionary Algorithms

Paper
Code

Double Path Networks for Sequence to Sequence Learning

1 code implementation • COLING 2018 • Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, Tie-Yan Liu

In this work we propose Double Path Networks for Sequence to Sequence learning (DPN-S2S), which leverage the advantages of both models by using double path information fusion.

Paper
Code

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

1 code implementation • 25 May 2022 • Kaitao Song, Yichong Leng, Xu Tan, Yicheng Zou, Tao Qin, Dongsheng Li

Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost.

Causal Language Modeling Language Modelling +2

Paper
Code

Learning To Teach Large Language Models Logical Reasoning

1 code implementation • 13 Oct 2023 • Meiqi Chen, Yubo Ma, Kaitao Song, Yixin Cao, Yan Zhang, Dongsheng Li

Large language models (LLMs) have gained enormous attention from both academia and industry, due to their exceptional ability in language generation and extremely powerful generalization.

counterfactual Event Relation Extraction +4

Paper
Code

Towards Understanding Omission in Dialogue Summarization

1 code implementation • 14 Nov 2022 • Yicheng Zou, Kaitao Song, Xu Tan, Zhongkai Fu, Qi Zhang, Dongsheng Li, Tao Gui

By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization.

Paper
Code

Learning Domain Invariant Prompt for Vision-Language Models

1 code implementation • 8 Dec 2022 • Cairong Zhao, Yubin Wang, Xinyang Jiang, Yifei Shen, Kaitao Song, Dongsheng Li, Duoqian Miao

Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples.

Ranked #4 on Prompt Engineering on Caltech-101

Domain Generalization Language Modelling +2

Paper
Code

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition

1 code implementation • 14 Mar 2023 • Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li, Xunying Liu, Helen Meng

Experimental results based on the ACII Challenge 2022 dataset demonstrate the superior performance of the proposed system and the effectiveness of considering multiple relationships using hierarchical regression chain models.

Ranked #1 on Vocal Bursts Intensity Prediction on HUME-VB

A-VB Culture A-VB High +6

Paper
Code

Neural Machine Translation with Error Correction

1 code implementation • 21 Jul 2020 • Kaitao Song, Xu Tan, Jianfeng Lu

Neural machine translation (NMT) generates the next target token given as input the previous ground truth target tokens during training while the previous generated target tokens during inference, which causes discrepancy between training and inference as well as error propagation, and affects the translation accuracy.

Machine Translation NMT +1

Paper
Code

Hybrid Self-Attention Network for Machine Translation

no code implementations • 1 Nov 2018 • Kaitao Song, Xu Tan, Furong Peng, Jianfeng Lu

The encoder-decoder is the typical framework for Neural Machine Translation (NMT), and different structures have been developed for improving the translation performance.

Machine Translation NMT +1

Paper
Add Code

Generating Adversarial Examples With Conditional Generative Adversarial Net

no code implementations • 18 Mar 2019 • Ping Yu, Kaitao Song, Jianfeng Lu

Recently, deep neural networks have significant progress and successful application in various fields, but they are found vulnerable to attack instances, e. g., adversarial examples.

Generative Adversarial Network

Paper
Add Code

LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning

no code implementations • 27 Apr 2020 • Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu

While pre-training and fine-tuning, e. g., BERT~\citep{devlin2018bert}, GPT-2~\citep{radford2019language}, have achieved great success in language understanding and generation tasks, the pre-trained models are usually too big for online deployment in terms of both memory cost and inference speed, which hinders them from practical online usage.

Knowledge Distillation Language Modelling

Paper
Add Code

Task-Agnostic and Adaptive-Size BERT Compression

no code implementations • 1 Jan 2021 • Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Li Jian, Tao Qin, Tie-Yan Liu

NAS-BERT trains a big supernet on a carefully designed search space containing various architectures and outputs multiple compressed models with adaptive sizes and latency.

Language Modelling Model Compression +1

Paper
Add Code

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

no code implementations • 30 May 2021 • Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, Tie-Yan Liu

The technical challenge of NAS-BERT is that training a big supernet on the pre-training task is extremely costly.

Language Modelling Model Compression +1

Paper
Add Code

Analyzing and Mitigating Interference in Neural Architecture Search

no code implementations • 29 Aug 2021 • Jin Xu, Xu Tan, Kaitao Song, Renqian Luo, Yichong Leng, Tao Qin, Tie-Yan Liu, Jian Li

In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe: 1) the interference on a shared operator between two child models is positively correlated with the number of different operators; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar.

Neural Architecture Search Reading Comprehension

Paper
Add Code

A study on the efficacy of model pre-training in developing neural text-to-speech system

no code implementations • 8 Oct 2021 • Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data.

Computational Efficiency

Paper
Add Code

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

no code implementations • 31 Mar 2022 • Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao

However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input.

Paper
Add Code

Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One

no code implementations • 26 Jun 2022 • Yezhen Wang, Tong Che, Bo Li, Kaitao Song, Hengzhi Pei, Yoshua Bengio, Dongsheng Li

Autoregressive generative models are commonly used, especially for those tasks involving sequential data.

Image Generation Language Modelling +1

Paper
Add Code

Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech

no code implementations • 10 Aug 2022 • Kaitao Song, Teng Wan, Bixia Wang, Huiqiang Jiang, Luna Qiu, Jiahang Xu, Liping Jiang, Qun Lou, Yuqing Yang, Dongsheng Li, Xudong Wang, Lili Qiu

Specifically, we first pre-train an encoder-decoder framework in an automatic speech recognition (ASR) objective by using speech-to-text dataset, and then fine-tune ASR encoder on the cleft palate dataset for hypernasality estimation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

no code implementations • 14 Mar 2023 • Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng

This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.

Alzheimer's Disease Detection Binary Classification

Paper
Add Code

Deliberate then Generate: Enhanced Prompting Framework for Text Generation

no code implementations • 31 May 2023 • Bei Li, Rui Wang, Junliang Guo, Kaitao Song, Xu Tan, Hany Hassan, Arul Menezes, Tong Xiao, Jiang Bian, Jingbo Zhu

Large language models (LLMs) have shown remarkable success across a wide range of natural language generation tasks, where proper prompt designs make great impacts.

Text Generation

Paper
Add Code

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

no code implementations • 5 Jun 2023 • Yukang Liang, Kaitao Song, Shaoguang Mao, Huiqiang Jiang, Luna Qiu, Yuqing Yang, Dongsheng Li, Linli Xu, Lili Qiu

Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level.

Paper
Add Code

PromptTTS 2: Describing and Generating Voices with Text Prompt

no code implementations • 5 Sep 2023 • Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech.

Language Modelling Large Language Model

Paper
Add Code

EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation Model

no code implementations • 11 Jan 2024 • Yuqi Chen, Kan Ren, Kaitao Song, Yansen Wang, Yifan Wang, Dongsheng Li, Lili Qiu

Self-supervised learning has emerged as a highly effective approach in the fields of natural language processing and computer vision.

Anomaly Detection EEG +2

Paper
Add Code

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

no code implementations • 5 Mar 2024 • Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt.

Quantization Speech Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.