Search Results for author: Xiaohuan Zhou

Found 10 papers, 7 papers with code

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

no code implementations • 12 Feb 2024 • Qian Yang, Jin Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, YuanJun Lv, Zhou Zhao, Chang Zhou, Jingren Zhou

By revealing the limitations of existing LALMs through evaluation results, AIR-Bench can provide insights into the direction of future research.

2k Automatic Speech Recognition +4

Paper
Add Code

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

2 code implementations • 14 Nov 2023 • Yunfei Chu, Jin Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, Jingren Zhou

Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.

Ranked #1 on Acoustic Scene Classification on TUT Acoustic Scenes 2017 (using extra training data)

Acoustic Scene Classification Audio captioning +4

3,256

Paper
Code

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

1 code implementation • 7 Oct 2023 • JiaMing Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

Audio captioning Automatic Speech Recognition +11

274

Paper
Code

Qwen Technical Report

2 code implementations • 28 Sep 2023 • Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.

Ranked #3 on Multi-Label Text Classification on CC3M-TagMask

Language Modelling Large Language Model +2

10,842

Paper
Code

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

2 code implementations • 18 May 2023 • Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, Chang Zhou

In this work, we explore a scalable way for building a general representation model toward unlimited modalities.

Ranked #1 on Semantic Segmentation on ADE20K (using extra training data)

Action Classification AudioCaps +16

6,039

Paper
Code

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

1 code implementation • 8 Dec 2022 • Jinze Bai, Rui Men, Hao Yang, Xuancheng Ren, Kai Dang, Yichang Zhang, Xiaohuan Zhou, Peng Wang, Sinan Tan, An Yang, Zeyu Cui, Yu Han, Shuai Bai, Wenbin Ge, Jianxin Ma, Junyang Lin, Jingren Zhou, Chang Zhou

As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data.

Multi-Task Learning

142

Paper
Code

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

1 code implementation • 29 Nov 2022 • Xiaohuan Zhou, JiaMing Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou

Therefore, we propose to introduce the phoneme modality into pre-training, which can help capture modality-invariant information between Mandarin speech and text.

Ranked #2 on Speech Recognition on AISHELL-1

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

2,321

Paper
Code

Contextual Expressive Text-to-Speech

no code implementations • 26 Nov 2022 • Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou

To achieve this task, we construct a synthetic dataset and develop an effective framework.

Speech Synthesis

Paper
Add Code

Speech2Slot: An End-to-End Knowledge-based Slot Filling from Speech

no code implementations • 10 May 2021 • Pengwei Wang, Xin Ye, Xiaohuan Zhou, Jinghui Xie, Hao Wang

In contrast to conventional pipeline Spoken Language Understanding (SLU) which consists of automatic speech recognition (ASR) and natural language understanding (NLU), end-to-end SLU infers the semantic meaning directly from speech and overcomes the error propagation caused by ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Paper
Add Code

xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems

19 code implementations • 14 Mar 2018 • Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, Guangzhong Sun

On one hand, the xDeepFM is able to learn certain bounded-degree feature interactions explicitly; on the other hand, it can learn arbitrary low- and high-order feature interactions implicitly.

Ranked #1 on Click-Through Rate Prediction on Dianping

Click-Through Rate Prediction Recommendation Systems

17,957

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.