no code implementations • 26 Sep 2022 • Hyunjae Lee, Gihyeon Lee, Junhwan Kim, Sungjun Cho, Dohyun Kim, Donggeun Yoo
However, it often results in selecting a sub-optimal configuration as training with the high-performing configuration typically converges slowly in an early phase.