1 code implementation • 6 Feb 2024 • Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos
State-space models (SSMs), such as Mamba Gu & Dao (2034), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention.
no code implementations • 19 Oct 2023 • Youngkyu Lee, Jongho Park, Chang-Ock Lee
The performance of neural networks has been significantly improved by increasing the number of channels in convolutional layers.
no code implementations • 20 Sep 2023 • Ilias Diakonikolas, Sushrut Karmalkar, Jongho Park, Christos Tzamos
Our goal is to accurately recover a \new{parameter vector $w$ such that the} function $g(w \cdot x)$ \new{has} arbitrarily small error when compared to the true values $g(w^* \cdot x)$, rather than the noisy measurements $y$.
no code implementations • 17 May 2023 • Jongho Park, Jinchao Xu
We propose a new training algorithm, named DualFL (Dualized Federated Learning), for solving distributed optimization problems in federated learning.
1 code implementation • 8 May 2023 • Gibbeum Lee, Volker Hartmann, Jongho Park, Dimitris Papailiopoulos, Kangwook Lee
In this paper, we propose MPC (Modular Prompted Chatbot), a new approach for creating high-quality conversational agents without the need for fine-tuning.
no code implementations • 4 May 2023 • Minwoo Lee, Kyu Tae Kim, Jongho Park
From the drift and diffusion terms of the Fokker--Planck equation, unknown parameters of the system are identified.
no code implementations • 11 Oct 2021 • Youngkyu Lee, Jongho Park, Chang-Ock Lee
In this paper, we propose a new convolution methodology called ``two-level'' group convolution that is robust with respect to the increase of the number of groups and suitable for multi-GPU parallel computation.
no code implementations • NeurIPS 2021 • Ilias Diakonikolas, Jongho Park, Christos Tzamos
This supervised learning task is efficiently solvable in the realizable setting, but is known to be computationally hard with adversarial label noise.
no code implementations • 16 Mar 2021 • Chang-Ock Lee, Youngkyu Lee, Jongho Park
We observe that layers of DNN can be interpreted as the time step of a time-dependent problem and can be parallelized by emulating a parallel-in-time algorithm called parareal.
no code implementations • 10 Feb 2020 • Zifan Liu, Jongho Park, Theodoros Rekatsinas, Christos Tzamos
We study the problem of robust mean estimation and introduce a novel Hamming distance-based measure of distribution shift for coordinate-level corruptions.