1 code implementation • 22 Oct 2024 • MuHan Lin, Shuyang Shi, Yue Guo, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Simon Stepputtis, Joseph Campbell, Katia Sycara
Reinforcement learning from human feedback is a successful technique that can mitigate such issues, however, the collection of human feedback can be laborious.
no code implementations • 8 Mar 2022 • Ti Bai, MuHan Lin, Xiao Liang, Biling Wang, Michael Dohopolski, Bin Cai, Dan Nguyen, Steve Jiang
A test time optimization (TTO) technique was proposed to further improve the DL models' performance.