Search Results for author: Jiayang Song

Found 12 papers, 2 papers with code

Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems

no code implementations29 Nov 2024 Shengming Zhao, Yuheng Huang, Jiayang Song, Zhijie Wang, Chengcheng Wan, Lei Ma

Retrieval-Augmented Generation (RAG) is a pivotal technique for enhancing the capability of large language models (LLMs) and has demonstrated promising efficacy across a diverse spectrum of tasks.

RAG Retrieval +1

LeCov: Multi-level Testing Criteria for Large Language Models

no code implementations20 Aug 2024 Xuan Xie, Jiayang Song, Yuheng Huang, Da Song, Fuyuan Zhang, Felix Juefei-Xu, Lei Ma

Large Language Models (LLMs) are widely used in many different domains, but because of their limited interpretability, there are questions about how trustworthy they are in various perspectives, e. g., truthfulness and toxicity.

AcTracer: Active Testing of Large Language Model via Multi-Stage Sampling

no code implementations7 Aug 2024 Yuheng Huang, Jiayang Song, Qiang Hu, Felix Juefei-Xu, Lei Ma

Given that LLMs' diverse task-handling abilities stem from large volumes of training data, a comprehensive evaluation also necessitates abundant, well-annotated, and representative test data to assess LLM performance across various downstream tasks.

Language Modeling Language Modelling +1

Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture

no code implementations10 Jul 2024 Jiayang Song, Yuheng Huang, Zhehua Zhou, Lei Ma

As safety remains a crucial concern throughout the development lifecycle of Large Language Models (LLMs), researchers and industrial practitioners have increasingly focused on safeguarding and aligning LLM behaviors with human preferences and ethical standards.

Safety Alignment

GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

no code implementations6 Jun 2024 Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma

To address this issue, we introduce in this work a novel Generalizable Safety enhancer (GenSafe) that is able to overcome the challenge of data insufficiency and enhance the performance of SRL approaches.

Autonomous Vehicles Deep Reinforcement Learning +2

Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward

no code implementations12 Apr 2024 Xuan Xie, Jiayang Song, Zhehua Zhou, Yuheng Huang, Da Song, Lei Ma

To bridge this gap, we conduct in this work a comprehensive evaluation of the effectiveness of existing online safety analysis methods on LLMs.

Fairness

LUNA: A Model-Based Universal Analysis Framework for Large Language Models

no code implementations22 Oct 2023 Da Song, Xuan Xie, Jiayang Song, Derui Zhu, Yuheng Huang, Felix Juefei-Xu, Lei Ma

the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes.

ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning

2 code implementations26 Aug 2023 Zhehua Zhou, Jiayang Song, Kunpeng Yao, Zhan Shu, Lei Ma

Motivated by the substantial achievements observed in Large Language Models (LLMs) in the field of natural language processing, recent research has commenced investigations into the application of LLMs for complex, long-horizon sequential task planning challenges in robotics.

Language Modeling Language Modelling +2

Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models

no code implementations16 Jul 2023 Yuheng Huang, Jiayang Song, Zhijie Wang, Shengming Zhao, Huaming Chen, Felix Juefei-Xu, Lei Ma

In particular, we experiment with twelve uncertainty estimation methods and four LLMs on four prominent natural language processing (NLP) tasks to investigate to what extent uncertainty estimation techniques could help characterize the prediction risks of LLMs.

Code Generation Hallucination +1

Cannot find the paper you are looking for? You can Submit a new open access paper.