no code implementations • 8 Feb 2023 • Cheolhyoung Lee, Kyunghyun Cho
We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification.
1 code implementation • 3 Oct 2022 • Eugene Choi, Kyunghyun Cho, Cheolhyoung Lee
We then propose a non-monotonic self-terminating language model, which significantly relaxes the constraint of monotonically increasing termination probability in the originally proposed self-terminating language model by Welleck et al. (2020), to address the issue of non-terminating sequences when using incomplete probable decoding algorithms.
2 code implementations • ICLR 2020 • Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang
We empirically evaluate the proposed mixout and its variants on finetuning a pretrained language model on downstream tasks.
no code implementations • ICLR 2019 • Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang
We empirically verify our result using deep convolutional networks and observe a higher correlation between the gradient stochasticity and the proposed directional uniformity than that against the gradient norm stochasticity, suggesting that the directional statistics of minibatch gradients is a major factor behind SGD.