no code implementations • 14 Dec 2024 • Zhiying Wang, Gang Sun, Yuhui Wang, Hongfang Yu, Dusit Niyato
The Space-Air-Ground Integrated Network (SAGIN) framework is a crucial foundation for future networks, where satellites and aerial nodes assist in computational task offloading.
no code implementations • 25 Oct 2024 • Tanqiu Jiang, Zian Wang, Jiacheng Liang, Changjiang Li, Yuhui Wang, Ting Wang
Jailbreak attacks circumvent LLMs' built-in safeguards by concealing harmful queries within jailbreak prompts.
no code implementations • 12 Jun 2024 • Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber
The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL).
no code implementations • 5 Jun 2024 • Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber
To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs.
no code implementations • 28 May 2024 • Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber
We show, however, that such IS-free methods underestimate the optimal value function (VF), especially for large $n$, restricting their capacity to efficiently utilize information from distant future time steps.
1 code implementation • 23 May 2024 • Qingyuan Wu, Simon Sinong Zhan, YiXuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang
In environments with delayed observation, state augmentation by including actions within the delay window is adopted to retrieve Markovian property to enable reinforcement learning (RL).
1 code implementation • 5 Feb 2024 • Qingyuan Wu, Simon Sinong Zhan, YiXuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang
To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments.
no code implementations • 21 Dec 2023 • Yuhui Wang, Junaid Farooq
The advent of fifth generation (5G) networks has opened new avenues for enhancing connectivity, particularly in challenging environments like remote areas or disaster-struck regions.
1 code implementation • ICCV 2023 • Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber
Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions.
no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber
What should be the social structure of an NLSOM?
1 code implementation • 30 Jan 2023 • Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny
In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL).
no code implementations • 7 Nov 2022 • Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong
Automatic Speech Recognition (ASR) systems typically yield output in lexical form.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Apr 2022 • Yuhui Wang, Yingxia Shao, Ang Li
In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing.
no code implementations • 21 Mar 2022 • Yuhui Wang, Junping Du, Yingxia Shao
This paper proposes a method for extracting intellectual property entities based on Transformer and technical word information , and provides accurate word vector representation in combination with the BERT language method.
no code implementations • 11 Mar 2022 • Yuhui Wang, Junaid Farooq
Unmanned aerial vehicles (UAVs) are a convenient choice for carrying mobile base stations to rapidly setup communication services for ground users.
1 code implementation • 28 Feb 2022 • Joya Chen, Kai Xu, Yuhui Wang, Yifei Cheng, Angela Yao
A standard hardware bottleneck when training deep neural networks is GPU memory.
no code implementations • 8 Nov 2021 • Yuhui Wang, Diane J. Cook
Time series data are valuable but are often inscrutable.
1 code implementation • 11 Jun 2021 • Chao Wen, Miao Xu, Zhilin Zhang, Zhenzhe Zheng, Yuhui Wang, Xiangyu Liu, Yu Rong, Dong Xie, Xiaoyang Tan, Chuan Yu, Jian Xu, Fan Wu, Guihai Chen, Xiaoqiang Zhu, Bo Zheng
Third, to deploy MAAB in the large-scale advertising system with millions of advertisers, we propose a mean-field approach.
no code implementations • 23 Feb 2021 • Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan
Most of the policy evaluation algorithms are based on the theories of Bellman Expectation and Optimality Equation, which derive two popular approaches - Policy Iteration (PI) and Value Iteration (VI).
no code implementations • 15 Jun 2020 • Yang You, Yuhui Wang, huan zhang, Zhao Zhang, James Demmel, Cho-Jui Hsieh
For the first time we scale the batch size on ImageNet to at least a magnitude larger than all previous work, and provide detailed studies on the performance of many state-of-the-art optimization schemes under this setting.
1 code implementation • 11 Nov 2019 • Xinghu Yao, Chao Wen, Yuhui Wang, Xiaoyang Tan
Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multi-agent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios.
1 code implementation • 19 Mar 2019 • Yuhui Wang, Hao He, Chao Wen, Xiaoyang Tan
Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks.
no code implementations • 15 Feb 2019 • Yuhui Wang, Hao He, Xiaoyang Tan
In real-world scenarios, the observation data for reinforcement learning with continuous control is commonly noisy and part of it may be dynamically missing over time, which violates the assumption of many current methods developed for this.
2 code implementations • NeurIPS 2019 • Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan
We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well.