no code implementations • 13 Jul 2023 • Sho Shimoyama, Tetsuro Morimura, Kenshi Abe, Toda Takamichi, Yuta Tomomatsu, Masakazu Sugiyama, Asahi Hentona, Yuuki Azuma, Hirotaka Ninomiya
One way to estimate rewards from collected data is to train the reward estimator and dialog policy simultaneously using adversarial learning (AL).