1 code implementation • 30 Jan 2024 • Nevan Wichers, Carson Denison, Ahmad Beirami
Red teaming is a common strategy for identifying weaknesses in generative language models (LMs), where adversarial prompts are produced that trigger an LM to generate unsafe responses.
no code implementations • 14 Jan 2024 • Meng Cao, Lei Shu, Lei Yu, Yun Zhu, Nevan Wichers, Yinxiao Liu, Lei Meng
We investigate this approach under two different settings: one where the policy model is smaller and is paired with a more powerful critic model, and another where a single language model fulfills both roles.
no code implementations • 15 Nov 2023 • Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, Jindong Chen, Lei Meng
Evaluating natural language systems poses significant challenges, particularly in the realms of natural language understanding and high-level reasoning.
no code implementations • 15 Nov 2023 • Yun Zhu, Nevan Wichers, Chu-Cheng Lin, Xinyi Wang, Tianlong Chen, Lei Shu, Han Lu, Canoee Liu, Liangchen Luo, Jindong Chen, Lei Meng
Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks.
no code implementations • 10 Feb 2022 • Dylan Slack, Yinlam Chow, Bo Dai, Nevan Wichers
However, we identify these techniques are not well equipped for safe policy learning because they ignore negative experiences(e. g., unsafe or unsuccessful), focusing only on positive experiences, which harms their ability to generalize to new tasks safely.
no code implementations • 29 Sep 2021 • Dylan Z Slack, Yinlam Chow, Bo Dai, Nevan Wichers
Though many reinforcement learning (RL) problems involve learning policies in settings that are difficult to specify safety constraints and sparse rewards, current methods struggle to rapidly and safely acquire successful policies.
no code implementations • 22 Dec 2020 • Zecheng He, Srinivas Sunkara, Xiaoxue Zang, Ying Xu, Lijuan Liu, Nevan Wichers, Gabriel Schubiner, Ruby Lee, Jindong Chen, Blaise Agüera y Arcas
Our methodology is designed to leverage visual, linguistic and domain-specific features in user interaction traces to pre-train generic feature representations of UIs and their components.
1 code implementation • 14 Feb 2020 • Nevan Wichers
In the real world, RL agents should be rewarded for fulfilling human preferences.
no code implementations • 12 Feb 2020 • Sergei Volodin, Nevan Wichers, Jeremy Nixon
We consider the problem of inferring a causal model of a reinforcement learning environment and we propose a method to deal with spurious correlations.
no code implementations • 24 Oct 2018 • Nevan Wichers, Dilek Hakkani-Tur, Jindong Chen
Images may have elements containing text and a bounding box associated with them, for example, text identified via optical character recognition on a computer screen image, or a natural image with labeled objects.
Optical Character Recognition Optical Character Recognition (OCR) +1
no code implementations • ICML 2018 • Nevan Wichers, Ruben Villegas, Dumitru Erhan, Honglak Lee
Much of recent research has been devoted to video prediction and generation, yet most of the previous works have demonstrated only limited success in generating videos on short-term horizons.
no code implementations • ICLR 2018 • Nevan Wichers, Dumitru Erhan, Honglak Lee
Much recent research has been devoted to video prediction and generation, but mostly for short-scale time horizons.