Search Results for author: Qingyang Li

Found 26 papers, 6 papers with code

Towards Reward Fairness in RLHF: From a Resource Allocation Perspective

1 code implementation29 May 2025 Sheng Ouyang, Yulan Hu, Ge Chen, Qingyang Li, Fuzheng Zhang, Yong liu

Specifically, we model preference learning as a resource allocation problem, treating rewards as resources to be allocated while considering the trade-off between utility and fairness in their distribution.

Fairness reinforcement-learning +1

Large Language Models powered Malicious Traffic Detection: Architecture, Opportunities and Case Study

no code implementations24 Mar 2025 Xinggong Zhang, Haotian Meng, Qingyang Li, Yunpeng Tan, Lei Zhang

In this paper, we focus on unleashing the full potential of Large Language Models (LLMs) in malicious traffic detection.

Traffic Classification

SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin

no code implementations19 Feb 2025 Hao Yi, Qingyang Li, Yulan Hu, Fuzheng Zhang, Di Zhang, Yong liu

Recently, enhancing the numerical and logical reasoning capability of Large Language Models (LLMs) has emerged as a research hotspot.

Logical Reasoning Policy Gradient Methods +2

Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models

no code implementations25 Nov 2024 Hao Yi, Qingyang Li, Yulan Hu, Fuzheng Zhang, Di Zhang, Yong liu

To address these issues, we propose a high-quality VQA preference dataset, called \textit{\textbf{M}ultiple \textbf{M}ultimodal \textbf{A}rtificial \textbf{I}ntelligence \textbf{P}reference Datasets in \textbf{V}QA} (\textbf{MMAIP-V}), which is constructed by sampling from the response distribution set and using an external scoring function for response evaluation.

Visual Question Answering (VQA)

MoCoKGC: Momentum Contrast Entity Encoding for Knowledge Graph Completion

no code implementations Empirical Methods in Natural Language Processing 2024 Qingyang Li, Yanru Zhong, YuChu Qin

However, existing approaches have not effectively combined the structural attributes of knowledge graphs with the textual descriptions of entities to generate robust entity encodings. To address this issue, this paper proposes MoCoKGC (Momentum Contrast Entity Encoding for Knowledge Graph Completion), which incorporates three primary encoders: the entity-relation encoder, the entity encoder, and the momentum entity encoder.

Contrastive Learning Link Prediction +1

Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training

no code implementations1 Oct 2024 Qingyang Li, Weimao Ke

This paper examines the pivotal role of dropout techniques in mitigating overfitting in language model training.

Decoder Language Modeling +1

TSO: Self-Training with Scaled Preference Optimization

no code implementations31 Aug 2024 Kaihui Chen, Hao Yi, Qingyang Li, Tianyu Qi, Yulan Hu, Fuzheng Zhang, Yong liu

Meanwhile, numerous iterative methods require additional training of reward models to select positive and negative samples from the model's own generated responses for preference learning.

Diversity

Towards Comprehensive Preference Data Collection for Reward Modeling

no code implementations24 Jun 2024 Yulan Hu, Qingyang Li, Sheng Ouyang, Ge Chen, Kaihui Chen, Lijun Mei, Xucheng Ye, Fuzheng Zhang, Yong liu

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models (LLMs) with human preferences, thereby enhancing the quality of responses generated.

Diversity Response Generation

Synthetic Dialogue Dataset Generation using LLM Agents

1 code implementation30 Jan 2024 Yelaman Abdullin, Diego Molla-Aliod, Bahadorreza Ofoghi, John Yearwood, Qingyang Li

We conduct human and automatic evaluations, including an evaluation approach that uses GPT-4 to mimic the human evaluation metrics.

Dataset Generation Prompt Engineering

Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios

no code implementations14 Nov 2023 Lei Lin, Jiayi Fu, Pengli Liu, Qingyang Li, Yan Gong, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai

Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality.

All Decoder +1

Comparative study of microgrid optimal scheduling under multi-optimization algorithm fusion

no code implementations3 Oct 2023 Hongyi Duan, Qingyang Li, Yuchen Li, Jianan Zhang, Yuming Xie

As global attention on renewable and clean energy grows, the research and implementation of microgrids become paramount.

Scheduling

Improvement and Enhancement of YOLOv5 Small Target Recognition Based on Multi-module Optimization

no code implementations3 Oct 2023 Qingyang Li, Yuchen Li, Hongyi Duan, JiaLiang Kang, Jianan Zhang, Xueqian Gan, Ruotong Xu

In this paper, the limitations of YOLOv5s model on small target detection task are deeply studied and improved.

Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems

1 code implementation3 May 2023 Xiong-Hui Chen, Bowei He, Yang Yu, Qingyang Li, Zhiwei Qin, Wenjie Shang, Jieping Ye, Chen Ma

However, building a user simulator with no reality-gap, i. e., can predict user's feedback exactly, is unrealistic because the users' reaction patterns are complex and historical logs for each user are limited, which might mislead the simulator-based recommendation policy.

Decision Making Recommendation Systems +1

Fewer is More: Efficient Object Detection in Large Aerial Images

1 code implementation26 Dec 2022 Xingxing Xie, Gong Cheng, Qingyang Li, Shicheng Miao, Ke Li, Junwei Han

Current mainstream object detection methods for large aerial images usually divide large images into patches and then exhaustively detect the objects of interest on all patches, no matter whether there exist objects or not.

4k Object +2

Spatio-temporal Incentives Optimization for Ride-hailing Services with Offline Deep Reinforcement Learning

no code implementations6 Nov 2022 Yanqiu Wu, Qingyang Li, Zhiwei Qin

Motivated by this observation, we make an attempt to optimize the distribution of demand to handle this problem by learning the long-term spatio-temporal values as a guideline for pricing strategy.

Deep Reinforcement Learning reinforcement-learning +1

Offline Model-based Adaptable Policy Learning

1 code implementation NeurIPS 2021 Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye

Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies.

Decision Making model +4

Offline Adaptive Policy Leaning in Real-World Sequential Recommendation Systems

no code implementations1 Jan 2021 Xiong-Hui Chen, Yang Yu, Qingyang Li, Zhiwei Tony Qin, Wenjie Shang, Yiping Meng, Jieping Ye

Instead of increasing the fidelity of models for policy learning, we handle the distortion issue via learning to adapt to diverse simulators generated by the offline dataset.

Sequential Recommendation

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

1 code implementation2 Apr 2020 Mengyue Yang, Qingyang Li, Zhiwei Qin, Jieping Ye

In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.

Multi-Armed Bandits

Weak Edge Identification Nets for Ocean Front Detection

no code implementations17 Sep 2019 Qingyang Li, Guoqiang Zhong, Cui Xie

The method uses the stochastic gradient descent and the correlation loss function to obtain a good ocean front image output.

Edge Detection

Long Short-Term Attention

no code implementations30 Oct 2018 Guoqiang Zhong, Xin Lin, Kang Chen, Qingyang Li, Kai-Zhu Huang

Attention is an important cognition process of humans, which helps humans concentrate on critical information during their perception and learning.

Multi-task Dictionary Learning based Convolutional Neural Network for Computer aided Diagnosis with Longitudinal Images

no code implementations31 Aug 2017 Jie Zhang, Qingyang Li, Richard J. Caselli, Jieping Ye, Yalin Wang

Firstly, we pre-train CNN on the ImageNet dataset and transfer the knowledge from the pre-trained model to the medical imaging progression representation, generating the features for different tasks.

Dictionary Learning image-classification +3

Large-scale Collaborative Imaging Genetics Studies of Risk Genetic Factors for Alzheimer's Disease Across Multiple Institutions

no code implementations19 Aug 2016 Qingyang Li, Tao Yang, Liang Zhan, Derrek Paul Hibar, Neda Jahanshad, Yalin Wang, Jieping Ye, Paul M. Thompson, Jie Wang

To the best of our knowledge, this is the first successful run of the computationally intensive model selection procedure to learn a consistent model across different institutions without compromising their privacy while ranking the SNPs that may collectively affect AD.

Model Selection

Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation

no code implementations30 Jul 2014 Binbin Lin, Qingyang Li, Qian Sun, Ming-Jun Lai, Ian Davidson, Wei Fan, Jieping Ye

The effectiveness of gene expression pattern annotation relies on the quality of feature representation.

Cannot find the paper you are looking for? You can Submit a new open access paper.