Search Results for author: Dong Yu

Found 288 papers, 119 papers with code

End-to-End Chinese Speaker Identification

1 code implementation NAACL 2022 Dian Yu, Ben Zhou, Dong Yu

End-to-end SI systems, on the other hand, are not limited by individual modules, but suffer from insufficient training data from the existing small-scale datasets.

coreference-resolution Machine Reading Comprehension +4

Variational Graph Autoencoding as Cheap Supervision for AMR Coreference Resolution

no code implementations ACL 2022 Irene Li, Linfeng Song, Kun Xu, Dong Yu

Coreference resolution over semantic graphs like AMRs aims to group the graph nodes that represent the same entity.

coreference-resolution

Instance-adaptive training with noise-robust losses against noisy labels

no code implementations EMNLP 2021 Lifeng Jin, Linfeng Song, Kun Xu, Dong Yu

In order to alleviate the huge demand for annotated datasets for different tasks, many recent natural language processing datasets have adopted automated pipelines for fast-tracking usable data.

RAST: Domain-Robust Dialogue Rewriting as Sequence Tagging

no code implementations EMNLP 2021 Jie Hao, Linfeng Song, LiWei Wang, Kun Xu, Zhaopeng Tu, Dong Yu

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context.

Dialogue Rewriting Text Generation

面向人工智能伦理计算的中文道德词典构建方法研究(Construction of a Chinese Moral Dictionary for Artificial Intelligence Ethical Computing)

no code implementations CCL 2020 Hongrui Wang, Chang Liu, Dong Yu

道德词典资源的建设是人工智能伦理计算的一个研究重点。由于道德行为复杂多样, 现有的英文道德词典分类体系并不完善, 而中文方面目前尚未有相关的词典资源, 理论体系和构建方法仍待探究。针对以上问题, 该文提出了面向人工智能伦理计算的中文道德词典构建任务, 设计了四类标签和四种类型, 得到包含25, 012个词的中文道德词典资源。实验结果表明, 该词典资源不仅能够使机器学会道德知识, 判断词的道德标签和类型, 而且能够为句子级别的道德文本分析提供数据支持。

结合深度学习和语言难度特征的句子可读性计算方法(The method of calculating sentence readability combined with deep learning and language difficulty characteristics)

no code implementations CCL 2020 Yuling Tang, Dong Yu

本文提出了可读性语料库构建的改进方法, 基于该方法, 构建了规模更大的汉语句子可读性语料库。该语料库在句子绝对难度评估任务上的准确率达到0. 7869, 相对前人工作提升了0. 15以上, 证明了改进方法的有效性。将深度学习方法应用于汉语可读性评估, 探究了不同深度学习方法自动捕获难度特征的能力, 并进仛步探究了向深度学习特征中融入不同层面的语难度特征对模型整体性能的影响。实验结果显示, 不同深度学习模型的难度特征捕获能力不尽相同, 语言难度特征可以不同程度地提高深度学习模型的难度表征能力。

Sentence

Lifelong Learning of Large Language Model based Agents: A Roadmap

1 code implementation13 Jan 2025 Junhao Zheng, Chengming Shi, Xidi Cai, Qiuke Li, Duzhen Zhang, Chenxing Li, Dong Yu, Qianli Ma

This survey is the first to systematically summarize the potential techniques for incorporating lifelong learning into LLM-based agents.

Incremental Learning Language Modeling +3

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT

no code implementations2 Jan 2025 Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng

The pre-trained BERT model extracts semantic features from a raw Chinese character sequence and the NN based classifier predicts the polyphonic character's pronunciation according to BERT output.

Polyphone disambiguation Sentence +1

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

no code implementations30 Dec 2024 Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu

The remarkable performance of models like the OpenAI o1 can be attributed to their ability to emulate human-like long-time thinking during inference.

GSM8K

A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression

no code implementations23 Dec 2024 Chenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Xinting Huang, Dong Yu, Zhicheng Dou

In this work, we provide a thorough investigation of gist-based context compression methods to improve long-context processing in large language models.

Teaching LLMs to Refine with Tools

no code implementations22 Dec 2024 Dian Yu, Yuheng Zhang, Jiahao Xu, Tian Liang, Linfeng Song, Zhaopeng Tu, Haitao Mi, Dong Yu

We propose CaP, a novel approach that uses external tools to refine chain-of-thought (CoT) responses generated by the same or other LLMs.

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

no code implementations26 Nov 2024 Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu

To gain deeper insights into this trend, we study over 1500 quantized LLM checkpoints of various sizes and at different training levels (undertrained or fully trained) in a controlled setting, deriving scaling laws for understanding the relationship between QiD and factors such as the number of training tokens, model size and bit width.

Quantization

Federated Incremental Named Entity Recognition

no code implementations18 Nov 2024 Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dong Yu

In a more realistic scenario, local clients receive new entity types continuously, while new local clients collecting novel data may irregularly join the global FNER training.

Knowledge Distillation named-entity-recognition +3

Evaluating Moral Beliefs across LLMs through a Pluralistic Framework

1 code implementation6 Nov 2024 Xuelin Liu, Yanfei Zhu, Shucheng Zhu, Pengyuan Liu, Ying Liu, Dong Yu

Additionally, through moral debates, we investigate the firmness of these models to their moral choices.

OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization

1 code implementation25 Oct 2024 Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Hongming Zhang, Tianqing Fang, Zhenzhong Lan, Dong Yu

In this paper, we introduce an open-source framework designed to facilitate the development of multimodal web agent that can autonomously conduct real-world exploration and improve itself.

Imitation Learning

LoGU: Long-form Generation with Uncertainty Expressions

1 code implementation18 Oct 2024 Ruihan Yang, Caiqi Zhang, Zhisong Zhang, Xinting Huang, Sen yang, Nigel Collier, Dong Yu, Deqing Yang

To tackle these challenges, we propose a refinement-based data collection framework and a two-stage training pipeline.

Instruction Following

Atomic Calibration of LLMs in Long-Form Generations

no code implementations17 Oct 2024 Caiqi Zhang, Ruihan Yang, Zhisong Zhang, Xinting Huang, Sen yang, Dong Yu, Nigel Collier

Existing research on LLM calibration has primarily focused on short-form tasks, providing a single confidence score at the response level (macro calibration).

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

1 code implementation17 Oct 2024 Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Ang Li, Dong Yu

Traditional transformer models often allocate a fixed amount of computational resources to every input token, leading to inefficient and unnecessary computation.

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

1 code implementation14 Oct 2024 Di wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, Dong Yu

Recent large language model (LLM)-driven chat assistant systems have integrated memory components to track user-assistant chat histories, enabling more accurate and personalized responses.

Benchmarking Large Language Model +1

ParallelSpec: Parallel Drafter for Efficient Speculative Decoding

no code implementations8 Oct 2024 Zilin Xiao, Hongming Zhang, Tao Ge, Siru Ouyang, Vicente Ordonez, Dong Yu

Speculative decoding has proven to be an efficient solution to large language model (LLM) inference, where the small drafter predicts future tokens at a low cost, and the target model is leveraged to verify them in parallel.

Language Modeling Language Modelling +2

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

no code implementations4 Oct 2024 Murong Yue, Wenlin Yao, Haitao Mi, Dian Yu, Ziyu Yao, Dong Yu

In this paper, we propose DOTS, an approach enabling LLMs to reason dynamically via optimal reasoning trajectory search, tailored to the specific characteristics of each question and the inherent capability of the task-solving LLM.

RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph

1 code implementation3 Oct 2024 Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao, Zhihan Zhang, Mengzhao Jia, Jiawei Han, Hongming Zhang, Dong Yu

Unlike traditional function-level or file-level coding tasks, AI software engineering requires not only basic coding proficiency but also advanced skills in managing and interacting with code repositories.

Code Generation

DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning

no code implementations2 Oct 2024 Yebowen Hu, Xiaoyang Wang, Wenlin Yao, Yiming Lu, Daoan Zhang, Hassan Foroosh, Dong Yu, Fei Liu

In this paper, we introduce DeFine, a new framework that constructs probabilistic factor profiles from complex scenarios.

Decision Making

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

1 code implementation2 Oct 2024 Mengzhao Jia, Wenhao Yu, Kaixin Ma, Tianqing Fang, Zhihan Zhang, Siru Ouyang, Hongming Zhang, Meng Jiang, Dong Yu

Tasks involving multiple text-rich images are especially challenging, as they require not only understanding the content of individual images but reasoning about inter-relationships and logical flows across multiple visual inputs.

Language Modeling Language Modelling

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

no code implementations2 Oct 2024 Hsin-Tien Chiang, Hao Zhang, Yong Xu, Meng Yu, Dong Yu

In challenging environments with significant noise and reverberation, traditional speech enhancement (SE) methods often lead to over-suppressed speech, creating artifacts during listening and harming downstream tasks performance.

Quantization Speech Enhancement

HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows

1 code implementation25 Sep 2024 Wenlin Yao, Haitao Mi, Dong Yu

Despite recent advancements in large language models (LLMs), their performance on complex reasoning problems requiring multi-step thinking and combining various skills is still limited.

Computational Efficiency

Video-to-Audio Generation with Fine-grained Temporal Semantics

no code implementations23 Sep 2024 Yuchen Hu, Yu Gu, Chenxing Li, Rilin Chen, Dong Yu

With recent advances of AIGC, video generation have gained a surge of research interest in both academia and industry (e. g., Sora).

Audio Generation Video Generation

Preference Alignment Improves Language Model-Based TTS

no code implementations19 Sep 2024 Jinchuan Tian, Chunlei Zhang, Jiatong Shi, Hao Zhang, Jianwei Yu, Shinji Watanabe, Dong Yu

Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based systems offer competitive performance to their counterparts.

Language Modeling Language Modelling +1

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

no code implementations17 Sep 2024 Jiarui Hai, Yong Xu, Hao Zhang, Chenxing Li, Helin Wang, Mounya Elhilali, Dong Yu

Latent diffusion models have shown promising results in text-to-audio (T2A) generation tasks, yet previous models have encountered difficulties in generation quality, computational cost, diffusion sampling, and data preparation.

Audio Generation

Towards Diverse and Efficient Audio Captioning via Diffusion Models

no code implementations14 Sep 2024 Manjie Xu, Chenxing Li, Xinyi Tu, Yong Ren, Ruibo Fu, Wei Liang, Dong Yu

We introduce Diffusion-based Audio Captioning (DAC), a non-autoregressive diffusion model tailored for diverse and efficient audio captioning.

Audio captioning Diversity +1

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

1 code implementation12 Sep 2024 Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu

To bridge this gap, we introduce DSBench, a comprehensive benchmark designed to evaluate data science agents with realistic tasks.

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

1 code implementation11 Sep 2024 Helin Wang, Meng Yu, Jiarui Hai, Chen Chen, Yuchen Hu, Rilin Chen, Najim Dehak, Dong Yu

In this paper, we introduce SSR-Speech, a neural codec autoregressive model designed for stable, safe, and robust zero-shot textbased speech editing and text-to-speech synthesis.

Decoder Speech Synthesis +2

Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array

no code implementations11 Sep 2024 Yue Qiao, Vinay Kothapally, Meng Yu, Dong Yu

Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality.

Comparing Discrete and Continuous Space LLMs for Speech Recognition

no code implementations1 Sep 2024 Yaoxun Xu, Shi-Xiong Zhang, Jianwei Yu, Zhiyong Wu, Dong Yu

This paper investigates discrete and continuous speech representations in Large Language Model (LLM)-based Automatic Speech Recognition (ASR), organizing them by feature continuity and training approach into four categories: supervised and unsupervised for both discrete and continuous types.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Advancing Multi-talker ASR Performance with Large Language Models

no code implementations30 Aug 2024 Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong Yu

Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

no code implementations28 Aug 2024 Dian Yu, Baolin Peng, Ye Tian, Linfeng Song, Haitao Mi, Dong Yu

There is a growing trend of teaching large language models (LLMs) to solve mathematical problems through coding.

Data Augmentation GSM8K +2

Video-to-Audio Generation with Hidden Alignment

no code implementations10 Jul 2024 Manjie Xu, Chenxing Li, Xinyi Tu, Yong Ren, Rilin Chen, Yu Gu, Wei Liang, Dong Yu

In this work, we aim to offer insights into the video-to-audio generation paradigm, focusing on three crucial aspects: vision encoders, auxiliary embeddings, and data augmentation techniques.

Audio Generation Data Augmentation +2

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

no code implementations30 Jun 2024 Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu

Specifically, we formulate the problem as a two-player game and propose a novel online algorithm, iterative Nash policy optimization (INPO).

LiteSearch: Efficacious Tree Search for LLM

no code implementations29 Jun 2024 Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Dian Yu, Haitao Mi, Jinsong Su, Dong Yu

Recent research suggests that tree search algorithms (e. g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks.

GSM8K Mathematical Reasoning

Scaling Synthetic Data Creation with 1,000,000,000 Personas

3 code implementations28 Jun 2024 Tao Ge, Xin Chan, Xiaoyang Wang, Dian Yu, Haitao Mi, Dong Yu

We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data.

Language Modeling Language Modelling +3

Abstraction-of-Thought Makes Language Models Better Reasoners

1 code implementation18 Jun 2024 Ruixin Hong, Hongming Zhang, Xiaoman Pan, Dong Yu, ChangShui Zhang

Abstract reasoning, the ability to reason from the abstract essence of a problem, serves as a key to generalization in human reasoning.

When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives

1 code implementation17 Jun 2024 Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Wenlin Yao, Hassan Foroosh, Dong Yu, Fei Liu

Finally, the effectiveness of reasoning is influenced by narrative complexity, information density, and domain-specific terms, highlighting the challenges in analytical reasoning tasks.

Attribute

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

no code implementations13 Jun 2024 Yiwen Shao, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Daniel Povey, Sanjeev Khudanpur

In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

1 code implementation3 Jun 2024 Yongxin Zhu, Dan Su, Liqiang He, Linli Xu, Dong Yu

While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs.

Audio Generation In-Context Learning +2

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

1 code implementation29 May 2024 Zhenwen Liang, Dian Yu, Wenhao Yu, Wenlin Yao, Zhihan Zhang, Xiangliang Zhang, Dong Yu

We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios that require sustained reasoning and dialogue understanding.

Benchmarking Dialogue Understanding +5

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

1 code implementation18 Apr 2024 Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu

Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning.

 Ranked #1 on GSM8K on GSM8K

Arithmetic Reasoning GSM8K +3

Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

no code implementations14 Apr 2024 Souvik Das, Lifeng Jin, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu

Current state-of-the-art approaches refine decoding by contrasting early-exit distributions from a lower layer with the final layer to exploit information related to factuality within the model forward procedure.

Hallucination

Polarity Calibration for Opinion Summarization

1 code implementation2 Apr 2024 Yuanyuan Lei, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Ruihong Huang, Dong Yu

To address this issue and make the summarizer express both sides of opinions, we introduce the concept of polarity calibration, which aims to align the polarity of output summary with that of input text.

Opinion Summarization

Conceptual and Unbiased Reasoning in Language Models

no code implementations30 Mar 2024 Ben Zhou, Hongming Zhang, Sihao Chen, Dian Yu, Hongwei Wang, Baolin Peng, Dan Roth, Dong Yu

Conceptual reasoning, the ability to reason in abstract and high-level perspectives, is key to generalization in human cognition.

Decision Making

Self-Consistency Boosts Calibration for Math Reasoning

no code implementations14 Mar 2024 Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development.

GSM8K Math

A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation

1 code implementation6 Mar 2024 Xiangci Li, Linfeng Song, Lifeng Jin, Haitao Mi, Jessica Ouyang, Dong Yu

In this paper, we present a high-quality benchmark named multi-source Wizard of Wikipedia (Ms. WoW) for evaluating multi-source dialogue knowledge selection and response generation.

Dialogue Generation Response Generation

Can Large Language Models do Analytical Reasoning?

no code implementations6 Mar 2024 Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Hassan Foroosh, Dong Yu, Fei Liu

Our analytical reasoning embodies the tasks of letting large language models count how many points each team scores in a quarter in the NBA and NFL games.

Language Modelling Large Language Model

Collaborative decoding of critical tokens for boosting factuality of large language models

no code implementations28 Feb 2024 Lifeng Jin, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian, Dong Yu

The most common training pipeline for large language models includes pretraining, finetuning and aligning phases, with their respective resulting models, such as the pretrained model and the finetuned model.

Hallucination Instruction Following

Fine-Grained Self-Endorsement Improves Factuality and Reasoning

no code implementations23 Feb 2024 Ante Wang, Linfeng Song, Baolin Peng, Ye Tian, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

Experiments on Biographies show that our method can effectively improve the factuality of generations with simple and intuitive prompts across different scales of LLMs.

GSM8K Language Modeling +3

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

1 code implementation18 Feb 2024 Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns.

Code Generation Data Visualization

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

2 code implementations15 Feb 2024 Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, Jianshu Chen

We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems.

Reinforcement Learning (RL)

SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs

no code implementations15 Feb 2024 Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Hassan Foroosh, Dong Yu, Fei Liu

In this paper, we introduce four novel tasks centered around sports data analytics to evaluate the numerical reasoning and information fusion capabilities of LLMs.

SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization

no code implementations31 Jan 2024 Sangwoo Cho, Kaiqiang Song, Chao Zhao, Xiaoyang Wang, Dong Yu

Multi-turn dialogues are characterized by their extended length and the presence of turn-taking conversations.

Diversity Language Modeling +2

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

2 code implementations25 Jan 2024 Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, Dong Yu

The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents.

MM-LLMs: Recent Advances in MultiModal Large Language Models

no code implementations24 Jan 2024 Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, Dong Yu

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies.

Decision Making Survey

Inconsistent dialogue responses and how to recover from them

1 code implementation18 Jan 2024 Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Dong Yu

One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem.

InFoBench: Evaluating Instruction Following Ability in Large Language Models

1 code implementation7 Jan 2024 Yiwei Qin, Kaiqiang Song, Yebowen Hu, Wenlin Yao, Sangwoo Cho, Xiaoyang Wang, Xuansheng Wu, Fei Liu, PengFei Liu, Dong Yu

This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions.

Instruction Following

Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention

no code implementations14 Dec 2023 Kaiqiang Song, Xiaoyang Wang, Sangwoo Cho, Xiaoman Pan, Dong Yu

This paper introduces a novel approach to enhance the capabilities of Large Language Models (LLMs) in processing and understanding extensive text sequences, a critical aspect in applications requiring deep comprehension and synthesis of large volumes of information.

Dense X Retrieval: What Retrieval Granularity Should We Use?

3 code implementations11 Dec 2023 Tong Chen, Hongwei Wang, Sihao Chen, Wenhao Yu, Kaixin Ma, Xinran Zhao, Hongming Zhang, Dong Yu

We discover that the retrieval unit choice significantly impacts the performance of both retrieval and downstream tasks.

Retrieval Sentence +1

Deep Audio Zooming: Beamwidth-Controllable Neural Beamformer

no code implementations22 Nov 2023 Meng Yu, Dong Yu

Audio zooming, a signal processing technique, enables selective focusing and enhancement of sound signals from a specified region, attenuating others.

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning

6 code implementations15 Nov 2023 Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu

Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (\textbf{MMC-Benchmark}), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts.

Chart Understanding

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

no code implementations15 Nov 2023 Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu

In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios.

Hallucination Retrieval

A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning

1 code implementation14 Nov 2023 Ruixin Hong, Hongming Zhang, Xinyu Pang, Dong Yu, ChangShui Zhang

In this paper, we take a closer look at the self-verification abilities of LLMs in the context of logical reasoning, focusing on their ability to identify logical fallacies accurately.

Logical Fallacies Logical Reasoning

TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs

1 code implementation9 Nov 2023 Shuyi Xie, Wenlin Yao, Yong Dai, Shaobo Wang, Donlin Zhou, Lifeng Jin, Xinhua Feng, Pengzhi Wei, Yujie Lin, Zhichao Hu, Dong Yu, Zhengyou Zhang, Jing Nie, Yuhong Liu

We construct a hierarchical task tree encompassing 7 major areas covering over 200 categories and over 800 tasks, which covers diverse capabilities such as question answering, reasoning, multiturn dialogue, and text generation, to evaluate LLMs in a comprehensive and in-depth manner.

Benchmarking Question Answering +1

Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

1 code implementation7 Nov 2023 Sihao Chen, Hongming Zhang, Tong Chen, Ben Zhou, Wenhao Yu, Dian Yu, Baolin Peng, Hongwei Wang, Dan Roth, Dong Yu

We introduce sub-sentence encoder, a contrastively-learned contextual embedding model for fine-grained semantic representation of text.

Contrastive Learning Semantic Similarity +3

UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

no code implementations25 Oct 2023 Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu

2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition.

speaker-diarization Speaker Diarization +3

On the Dimensionality of Sentence Embeddings

no code implementations23 Oct 2023 Hongwei Wang, Hongming Zhang, Dong Yu

Therefore, we propose a two-step training method for sentence representation learning models, wherein the encoder and the pooler are optimized separately to mitigate the overall performance loss in low-dimension scenarios.

Sentence Sentence Classification +3

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

1 code implementation20 Oct 2023 Wenyu Guo, Qingkai Fang, Dong Yu, Yang Feng

Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation.

Decoder Multimodal Machine Translation +3

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

no code implementations2 Oct 2023 Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs.

Denoising Self-Supervised Learning +2

Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression

no code implementations27 Sep 2023 Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu

Acoustic howling suppression (AHS) is a critical challenge in audio communication systems.

Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

no code implementations27 Sep 2023 Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu

In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process.

Acoustic echo cancellation

Stabilizing RLHF through Advantage Model and Selective Rehearsal

no code implementations18 Sep 2023 Baolin Peng, Linfeng Song, Ye Tian, Lifeng Jin, Haitao Mi, Dong Yu

Large Language Models (LLMs) have revolutionized natural language processing, yet aligning these models with human values and preferences using RLHF remains a significant challenge.

LASER: LLM Agent with State-Space Exploration for Web Navigation

1 code implementation15 Sep 2023 Kaixin Ma, Hongming Zhang, Hongwei Wang, Xiaoman Pan, Wenhao Yu, Dong Yu

We evaluate our proposed LLM Agent with State-Space ExploRation (LASER) on both the WebShop task and amazon. com.

Decision Making

Unsupervised Multi-document Summarization with Holistic Inference

no code implementations8 Sep 2023 Haopeng Zhang, Sangwoo Cho, Kaiqiang Song, Xiaoyang Wang, Hongwei Wang, Jiawei Zhang, Dong Yu

SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners.

Diversity Document Summarization +2

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

no code implementations4 Sep 2023 Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng

Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

1 code implementation19 Aug 2023 Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe

While the vanilla transducer does not have a prior preference for any of the valid paths, this work intends to enforce the preferred paths and achieve controllable alignment prediction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

no code implementations8 Jul 2023 Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, Dong Yu

Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57. 6% of the correctly detected hallucinations.

Hallucination

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

no code implementations30 May 2023 Rongjie Huang, Chunlei Zhang, Yongqi Wang, Dongchao Yang, Luping Liu, Zhenhui Ye, Ziyue Jiang, Chao Weng, Zhou Zhao, Dong Yu

Various applications of voice synthesis have been developed independently despite the fact that they generate "voice" as output in common.

Singing Voice Synthesis Text to Speech +1

Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations

1 code implementation24 May 2023 James Y. Huang, Wenlin Yao, Kaiqiang Song, Hongming Zhang, Muhao Chen, Dong Yu

It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space.

Decoder Semantic Similarity +5

PIVOINE: Instruction Tuning for Open-world Information Extraction

1 code implementation24 May 2023 Keming Lu, Xiaoman Pan, Kaiqiang Song, Hongming Zhang, Dong Yu, Jianshu Chen

In particular, we construct INSTRUCTOPENWIKI, a substantial instruction tuning dataset for Open-world IE enriched with a comprehensive corpus, extensive annotations, and diverse instructions.

Instruction Following Language Modeling +2

Open-Domain Event Graph Induction for Mitigating Framing Bias

no code implementations22 May 2023 Siyi Liu, Hongming Zhang, Hongwei Wang, Kaiqiang Song, Dan Roth, Dong Yu

However, none of the existing methods have explicitly addressed the issue of framing bias that is inherent in news articles.

Faithful Question Answering with Monte-Carlo Planning

1 code implementation4 May 2023 Ruixin Hong, Hongming Zhang, Hong Zhao, Dong Yu, ChangShui Zhang

In this paper, we propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps.

Decision Making Question Answering +1

Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

no code implementations4 May 2023 Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu

During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN).

Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings

no code implementations2 May 2023 Hao Zhang, Meng Yu, Dong Yu

In particular, the interplay between acoustic echo and acoustic howling in a hybrid meeting makes the joint suppression of them difficult.

Speech Separation

Deep AHS: A Deep Learning Approach to Acoustic Howling Suppression

no code implementations18 Feb 2023 Hao Zhang, Meng Yu, Dong Yu

In this paper, we formulate acoustic howling suppression (AHS) as a supervised learning problem and propose a deep learning approach, called Deep AHS, to address it.

Deep Learning Speech Separation

Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production

1 code implementation16 Feb 2023 Ante Wang, Linfeng Song, Qi Liu, Haitao Mi, Longyue Wang, Zhaopeng Tu, Jinsong Su, Dong Yu

We propose a dialogue model that can access the vast and dynamic information from any search engine for response generation.

Chatbot Response Generation

Friend-training: Learning from Models of Different but Related Tasks

no code implementations31 Jan 2023 Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Xiabing Zhou, Dong Yu

Current self-training methods such as standard self-training, co-training, tri-training, and others often focus on improving model performance on a single task, utilizing differences in input features, model architectures, and training processes.

Dialogue Rewriting Dialogue Understanding +1

Neural Target Speech Extraction: An Overview

1 code implementation31 Jan 2023 Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Černocký, Dong Yu

Humans can listen to a target speaker even in challenging acoustic conditions that have noise, reverberation, and interfering speakers.

Speech Extraction

NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation

no code implementations29 Jan 2023 Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang

The robustness of the Kalman filter to double talk and its rapid convergence make it a popular approach for addressing acoustic echo cancellation (AEC) challenges.

Acoustic echo cancellation

OASum: Large-Scale Open Domain Aspect-based Summarization

1 code implementation19 Dec 2022 Xianjun Yang, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Xiaoman Pan, Linda Petzold, Dong Yu

Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model.

TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR

no code implementations12 Dec 2022 Lixin Cao, Jun Wang, Ben Yang, Dan Su, Dong Yu

Self-supervised learning (SSL) models confront challenges of abrupt informational collapse or slow dimensional collapse.

Self-Supervised Learning

ZeroKBC: A Comprehensive Benchmark for Zero-Shot Knowledge Base Completion

1 code implementation6 Dec 2022 Pei Chen, Wenlin Yao, Hongming Zhang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen

However, there has been limited research on the zero-shot KBC settings, where we need to deal with unseen entities and relations that emerge in a constantly growing knowledge base.

Knowledge Base Completion Knowledge Graphs

Deep Neural Mel-Subband Beamformer for In-car Speech Separation

no code implementations22 Nov 2022 Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use.

Speech Separation

Efficient Zero-shot Event Extraction with Context-Definition Alignment

1 code implementation9 Nov 2022 Hongming Zhang, Wenlin Yao, Dong Yu

We argue that using the static embedding of the event type name might not be enough because a single word could be ambiguous, and we need a sentence to define the type semantics accurately.

Contrastive Learning Sentence +1

Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing

no code implementations8 Nov 2022 Wenyue Hua, Lifeng Jin, Linfeng Song, Haitao Mi, Yongfeng Zhang, Dong Yu

Pretrained natural language processing (NLP) models have achieved high overall performance, but they still make systematic errors.

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

no code implementations28 Oct 2022 Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen

In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory.

Common Sense Reasoning Coreference Resolution +8

Learning a Grammar Inducer from Massive Uncurated Instructional Videos

1 code implementation22 Oct 2022 Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo

While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.

Language Acquisition Video Alignment

Salience Allocation as Guidance for Abstractive Summarization

1 code implementation22 Oct 2022 Fei Wang, Kaiqiang Song, Hongming Zhang, Lifeng Jin, Sangwoo Cho, Wenlin Yao, Xiaoyang Wang, Muhao Chen, Dong Yu

Recent literature adds extractive summaries as guidance for abstractive summarization models to provide hints of salient content and achieves better performance.

Abstractive Text Summarization

Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

no code implementations14 Oct 2022 Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe

Besides predicting the target sequence, a side product of CTC is to predict the alignment, which is the most probable input-long sequence that specifies a hard aligning relationship between the input and target units.

Cross-Lingual Speaker Identification Using Distant Supervision

1 code implementation11 Oct 2022 Ben Zhou, Dian Yu, Dong Yu, Dan Roth

Speaker identification, determining which character said each utterance in literary text, benefits many downstream tasks.

Language Modeling Language Modelling +1

Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks

1 code implementation1 Oct 2022 Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, Heng Ji

Notably, our proposed $\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation tasks while being 3. 9x smaller in model size.

Language Modeling Language Modelling +3

C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification

no code implementations15 Aug 2022 Chunlei Zhang, Dong Yu

On the basis of the pretrained CSSL model, we further propose to employ a negative sample free SSL objective (i. e., DINO) to fine-tune the speaker embedding network.

Contrastive Learning Self-Supervised Learning +1

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

1 code implementation20 Jul 2022 Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu

In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.

Ranked #15 on Audio Generation on AudioCaps (FD metric)

Audio Generation Decoder

Hierarchical Context Tagging for Utterance Rewriting

1 code implementation22 Jun 2022 Lisa Jin, Linfeng Song, Lifeng Jin, Dong Yu, Daniel Gildea

HCT (i) tags the source string with token-level edit actions and slotted rules and (ii) fills in the resulting rule slots with spans from the dialogue context.

TAG

Automatic Prosody Annotation with Pre-Trained Text-Speech Model

1 code implementation16 Jun 2022 Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu

Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability.

Speech Synthesis Text to Speech +2

Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE

no code implementations6 Jun 2022 Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

We leverage recent advancements in self-supervised speech representation learning as well as speech synthesis front-end techniques for system development.

Representation Learning Speech Representation Learning +3

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

1 code implementation5 Jun 2022 Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu

Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

no code implementations20 May 2022 Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.

Acoustic echo cancellation Speech Enhancement +2

Towards Improved Zero-shot Voice Conversion with Conditional DSVAE

1 code implementation11 May 2022 Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

In our experiment on the VCTK dataset, we demonstrate that content embeddings derived from the conditional DSVAE overcome the randomness and achieve a much better phoneme classification accuracy, a stabilized vocalization and a better zero-shot VC performance compared with the competitive DSVAE baseline.

Voice Conversion

Distant finetuning with discourse relations for stance classification

no code implementations27 Apr 2022 Lifeng Jin, Kun Xu, Linfeng Song, Dong Yu

Approaches for the stance classification task, an important task for understanding argumentation in debates and detecting fake news, have been relying on models which deal with individual debate topics.

Classification Stance Classification