no code implementations • Findings (EMNLP) 2021 • Sen yang, Qingyu Zhou, Dawei Feng, Yang Liu, Chao Li, Yunbo Cao, Dongsheng Li
Moreover, this task can be used to improve visual question generation and visual question answering.
no code implementations • 31 Mar 2025 • Qiang Wang, Dawei Feng, Xu Zhang, Ao Shen, Yang Xu, Bo Ding, Huaimin Wang
Instruction tuning has emerged as a paramount method for tailoring the behaviors of LLMs.
no code implementations • 23 Oct 2024 • Zheng Luo, Ming Feng, Zijian Gao, Jinyang Yu, Liang Hu, Tao Wang, Shenao Xue, Shen Zhou, Fangping Ouyang, Dawei Feng, Kele Xu, Shanshan Wang
The emergence of deep learning (DL) has provided great opportunities for the high-throughput analysis of atomic-resolution micrographs.
no code implementations • 9 Oct 2024 • Huanxi Liu, Jiaqi Liao, Dawei Feng, Kele Xu, Huaimin Wang
AutoFeedback achieves an accuracy of 100. 00\% on a real-world API dataset and reduces the cost of interaction with GPT-3. 5 Turbo by 23. 44\%, and GPT-4 Turbo by 11. 85\%.
no code implementations • 23 May 2024 • Yuanzhao Zhai, Zhuo Zhang, Kele Xu, Hanyang Peng, Yue Yu, Dawei Feng, Cheng Yang, Bo Ding, Huaimin Wang
To overcome this limitation, we propose Online Self-Preferring (OSP) language models to learn from self-generated response pairs and self-judged preference strengths.
no code implementations • 16 May 2024 • Dawei Feng, Yihai Zhang, Zhixuan Xu
In this article, we proposed Information Gain Optimized Tokenizer (IGOT), which analyzes the special token set of downstream tasks, constructs a new subset using heuristic function $\phi$ with the special token and its information gain, to build new domain-specific tokenizer, and continues pretraining on the downstream task data.
no code implementations • 11 Jan 2024 • Yuanzhao Zhai, Yiying Li, Zijian Gao, Xudong Gong, Kele Xu, Dawei Feng, Ding Bo, Huaimin Wang
ORPO generates Optimistic model Rollouts for Pessimistic offline policy Optimization.
no code implementations • 30 Dec 2023 • Yuanzhao Zhai, Han Zhang, Yu Lei, Yue Yu, Kele Xu, Dawei Feng, Bo Ding, Huaimin Wang
Reinforcement learning from human feedback (RLHF) emerges as a promising paradigm for aligning large language models (LLMs).
no code implementations • 24 Aug 2022 • Zijian Gao, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang
Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards.
no code implementations • 24 Aug 2022 • Zijian Gao, Yiying Li, Kele Xu, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang
The curiosity arouses if memorized information can not deal with the current state, and the information gap between dual learners can be formulated as the intrinsic reward for agents, and then such state information can be consolidated into the dynamic memory.
no code implementations • 21 May 2022 • Chao Chen, Zijian Gao, Kele Xu, Sen yang, Yiying Li, Bo Ding, Dawei Feng, Huaimin Wang
To handle the sparsity of the extrinsic rewards in reinforcement learning, researchers have proposed intrinsic reward which enables the agent to learn the skills that might come in handy for pursuing the rewards in the future, such as encouraging the agent to visit novel states.
1 code implementation • 5 Jul 2021 • Zhishan Zhao, Sen yang, Guohui Liu, Dawei Feng, Kele Xu
As a critical component for online advertising and marking, click-through rate (CTR) prediction has draw lots of attentions from both industry and academia field.
no code implementations • ACL 2019 • Sen Yang, Dawei Feng, Linbo Qiao, Zhigang Kan, Dongsheng Li
Traditional approaches to the task of ACE event extraction usually depend on manually annotated data, which is often laborious to create and limited in size.
no code implementations • 16 Oct 2018 • Kele Xu, Haibo Mi, Dawei Feng, Huaimin Wang, Chuan Chen, Zibin Zheng, Xu Lan
Valuable training data is often owned by independent organizations and located in multiple data centers.
no code implementations • 12 Jun 2018 • Dawei Feng, Kele Xu, Haibo Mi, Feifan Liao, Yan Zhou
In this paper, we explore the use of multi-scale Dense connected convolutional neural network (DenseNet) for the classification task, with the goal to improve the classification performance as multi-scale features can be extracted from the time-frequency representation of the audio signal.
no code implementations • 18 May 2018 • Kele Xu, Dawei Feng, Haibo Mi, Boqing Zhu, Dezhi Wang, Lilun Zhang, Hengxing Cai, Shuwen Liu
Audio scene classification, the problem of predicting class labels of audio scenes, has drawn lots of attention during the last several years.