no code implementations • 21 Nov 2023 • Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik
Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences.
1 code implementation • 30 Apr 2023 • Baiting Zhu, Meihua Dang, Aditya Grover
In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences.
1 code implementation • 15 Apr 2023 • Honghua Zhang, Meihua Dang, Nanyun Peng, Guy Van Den Broeck
To overcome this challenge, we propose to use tractable probabilistic models (TPMs) to impose lexical constraints in autoregressive text generation models, which we refer to as GeLaTo (Generating Language with Tractable Constraints).
1 code implementation • 22 Nov 2022 • Meihua Dang, Anji Liu, Guy Van Den Broeck
The growing operation increases model capacity by increasing the size of the latent space.
no code implementations • 29 Sep 2021 • Yuwei Yang, Siqi Ouyang, Meihua Dang, Mingyue Zheng, Lei LI, Hao Zhou
Deep learning models have been widely used in automatic drug design.
no code implementations • 18 Sep 2020 • YooJung Choi, Meihua Dang, Guy Van Den Broeck
This is often challenging as the labels in the data are biased.
1 code implementation • 18 Jul 2020 • Meihua Dang, Antonio Vergari, Guy Van Den Broeck
Probabilistic circuits (PCs) represent a probability distribution as a computational graph.