no code implementations • 13 Feb 2024 • Mohammad Ghazi Vakili, Christoph Gorgulla, AkshatKumar Nigam, Dmitry Bezrukov, Daniel Varoli, Alex Aliper, Daniil Polykovsky, Krishna M. Padmanabha Das, Jamie Snider, Anna Lyakisheva, Ardalan Hosseini Mansob, Zhong Yao, Lela Bitar, Eugene Radchenko, Xiao Ding, Jinxin Liu, Fanye Meng, Feng Ren, Yudong Cao, Igor Stagljar, Alán Aspuru-Guzik, Alex Zhavoronkov
The discovery of small molecules with therapeutic potential is a long-standing challenge in chemistry and biology.
no code implementations • 29 Jan 2024 • Ziqi Zhang, Jingzehua Xu, Jinxin Liu, Zifeng Zhuang, Donglin Wang
On the other hand, Decision Transformer (DT) abstracts the decision-making as sequence modeling, showcasing competitive performance on offline RL benchmarks, however, recent studies demonstrate that DT lacks of stitching capability, thus exploit stitching capability for DT is vital to further improve its performance.
no code implementations • 18 Jan 2024 • Jinxin Liu, Petar Djukic, Michel Kulhandjian, Burak Kantarci
We propose Deep Dict, a deep learning-based lossy time series compressor designed to achieve a high compression ratio while maintaining decompression error within a predefined range.
no code implementations • 11 Jan 2024 • Jinxin Liu, Shulin Cao, Jiaxin Shi, Tingjian Zhang, Lei Hou, Juanzi Li
Extensive experiments with models of different sizes and in different formal languages show that today's state-of-the-art LLMs' understanding of the logical forms can approach human level overall, but there still are plenty of room in generating correct logical forms, which suggest that it is more effective to use LLMs to generate more natural language training data to reinforce a small model than directly answering questions with LLMs.
no code implementations • 12 Dec 2023 • Ziqi Zhang, Jingzehua Xu, Zifeng Zhuang, Jinxin Liu, Donglin Wang, Shuai Zhang
Different from previous clipping approaches, we consider increasing the maximum cumulative Return in reinforcement learning (RL) tasks as the preference of the RL task, and propose a bi-level proximal policy optimization paradigm, which involves not only optimizing the policy but also dynamically adjusting the clipping bound to reflect the preference of the RL tasks to further elevate the training outcomes and stability of PPO.
1 code implementation • NeurIPS 2023 • Bowei He, Zexu Sun, Jinxin Liu, Shuai Zhang, Xu Chen, Chen Ma
We theoretically analyze the influence of the generated expert data and the improvement of generalization.
no code implementations • 7 Oct 2023 • Ziqi Zhang, Xiao Xiong, Zifeng Zhuang, Jinxin Liu, Donglin Wang
Offline-to-online RL can make full use of pre-collected offline datasets to initialize policies, resulting in higher sample efficiency and better performance compared to only using online algorithms alone for policy training.
no code implementations • 3 Sep 2023 • Jinxin Liu, Murat Simsek, Michele Nogueira, Burak Kantarci
Timely response of Network Intrusion Detection Systems (NIDS) is constrained by the flow generation process which requires accumulation of network packets.
1 code implementation • 19 Jul 2023 • Yachen Kang, Li He, Jinxin Liu, Zifeng Zhuang, Donglin Wang
Due to the existence of similarity trap, such consistency regularization improperly enhances the consistency possiblity of the model's predictions between segment pairs, and thus reduces the confidence in reward learning, since the augmented distribution does not match with the original one in PbRL.
no code implementations • 26 Jun 2023 • Yao Lai, Jinxin Liu, Zhentao Tang, Bin Wang, Jianye Hao, Ping Luo
To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data.
no code implementations • NeurIPS 2023 • Jinxin Liu, Hongyin Zhang, Zifeng Zhuang, Yachen Kang, Donglin Wang, Bin Wang
Naturally, such a paradigm raises three core questions that are not fully answered by prior non-iterative offline RL counterparts like reward-conditioned policy: (q1) What information should we transfer from the inner-level to the outer-level?
no code implementations • 23 Jun 2023 • Jinxin Liu, Lipeng Zu, Li He, Donglin Wang
As a remedy for the labor-intensive labeling, we propose to endow offline RL tasks with a few expert data and utilize the limited expert data to drive intrinsic rewards, thus eliminating the need for extrinsic rewards.
1 code implementation • 22 Jun 2023 • Jinxin Liu, Ziqi Zhang, Zhenyu Wei, Zifeng Zhuang, Yachen Kang, Sibo Gai, Donglin Wang
Offline reinforcement learning (RL) aims to learn a policy using only pre-collected and fixed data.
1 code implementation • 15 Jun 2023 • Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan YAO, Ning Ding, Lei Hou, Zhiyuan Liu, Bin Xu, Jie Tang, Juanzi Li
The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations.
1 code implementation • 25 May 2023 • Yachen Kang, Diyuan Shi, Jinxin Liu, Li He, Donglin Wang
Instead, the agent is provided with fixed offline trajectories and human preferences between pairs of trajectories to extract the dynamics and task information, respectively.
1 code implementation • 23 May 2023 • Ji Qi, Chuchun Zhang, Xiaozhi Wang, Kaisheng Zeng, Jifan Yu, Jinxin Liu, Jiuding Sun, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu
In this paper, we present the first benchmark that simulates the evaluation of open information extraction models in the real world, where the syntactic and expressive distributions under the same knowledge meaning may drift variously.
2 code implementations • 22 Feb 2023 • Zifeng Zhuang, Kun Lei, Jinxin Liu, Donglin Wang, Yilang Guo
Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-critic methods perform poorly due to the overestimation of out-of-distribution state-action pairs.
no code implementations • 8 Oct 2022 • Ji Qi, Bin Xu, Kaisheng Zeng, Jinxin Liu, Jifan Yu, Qi Gao, Juanzi Li, Lei Hou
Document-level relation extraction with graph neural networks faces a fundamental graph construction gap between training and inference - the golden graph structure only available during training, which causes that most methods adopt heuristic or syntactic rules to construct a prior graph as a pseudo proxy.
no code implementations • 7 Apr 2022 • Zhiyan Chen, Jinxin Liu, Yu Shen, Murat Simsek, Burak Kantarci, Hussein T. Mouftah, Petar Djukic
Advanced persistent threat (APT) is prominent for cybercriminals to compromise networks, and it is crucial to long-term and harmful characteristics.
no code implementations • ICLR 2022 • Jinxin Liu, Hongyin Zhang, Donglin Wang
Specifically, DARA emphasizes learning from those source transition pairs that are adaptive for the target environment and mitigates the offline dynamics shift by characterizing state-action-next-state pairs instead of the typical state-action distribution sketched by prior offline RL methods.
no code implementations • NeurIPS 2021 • Jinxin Liu, Hao Shen, Donglin Wang, Yachen Kang, Qiangxing Tian
Unsupervised reinforcement learning aims to acquire skills without prior goal representations, where an agent automatically explores an open-ended environment to represent goals and learn the goal-conditioned policy.
no code implementations • 21 Oct 2021 • Yachen Kang, Jinxin Liu, Xin Cao, Donglin Wang
To achieve this, the widely used GAN-inspired IRL method is adopted, and its discriminator, recognizing policy-generating trajectories, is modified with the quantification of dynamics difference.
no code implementations • 29 Aug 2021 • Jinxin Liu, Murat Simsek, Burak Kantarci, Melike Erol-Kantarci, Andrew Malton, Andrew Walenstein
The risk levels are associated with access control decisions recommended by a security policy.
no code implementations • 11 Apr 2021 • Jinxin Liu, Donglin Wang, Qiangxing Tian, Zhengyu Chen
It is of significance for an agent to learn a widely applicable and general-purpose policy that can achieve diverse goals including images and text descriptions.
no code implementations • 25 Sep 2019 • Qiangxing Tian, Jinxin Liu, Donglin Wang
By maximizing an information theoretic objective, a few recent methods empower the agent to explore the environment and learn useful skills without supervision.