1 code implementation • ICML 2020 • Jay Heo, Junhyeon Park, Hyewon Jeong, Kwang Joon Kim, Juho Lee, Eunho Yang, Sung Ju Hwang
Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features.
no code implementations • 1 Jun 2023 • Hyunsu Kim, Hyungi Lee, Hongseok Yang, Juho Lee
The key component of our method is what we call equivariance regularizer for a given type of symmetries, which measures how much a model is equivariant with respect to the symmetries of the type.
no code implementations • 24 May 2023 • Moonseok Choi, Hyungi Lee, Giung Nam, Juho Lee
Given the ever-increasing size of modern neural networks, the significance of sparse architectures has surged due to their accelerated inference speeds and minimal memory demands.
no code implementations • 19 Apr 2023 • Hyungi Lee, Eunggu Yun, Giung Nam, Edwin Fong, Juho Lee
Based on this result, instead of assuming any form of the latent variables, we equip a NP with a predictive distribution implicitly defined with neural networks and use the corresponding martingale posteriors as the source of uncertainty.
no code implementations • 19 Apr 2023 • Giung Nam, Sunguk Jang, Juho Lee
Decoupling representation learning and classifier learning has been shown to be effective in classification with long-tailed data.
1 code implementation • 2 Feb 2023 • Francois Caron, Fadhel Ayed, Paul Jung, Hoil Lee, Juho Lee, Hongseok Yang
We consider the optimisation of large and shallow neural networks via gradient flow, where the output of each hidden node is scaled by some positive parameter.
1 code implementation • 12 Oct 2022 • Balhae Kim, JungWon Choi, Seanie Lee, Yoonho Lee, Jung-Woo Ha, Juho Lee
Finally, we propose a novel Bayesian pseudocoreset algorithm based on minimizing forward KL divergence.
1 code implementation • 5 Oct 2022 • Youngwan Lee, Jeffrey Willette, Jonghee Kim, Juho Lee, Sung Ju Hwang
Masked image modeling (MIM) has become a popular strategy for self-supervised learning~(SSL) of visual representations with Vision Transformers.
no code implementations • 30 Sep 2022 • Seanie Lee, Minki Kang, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi
Pre-training a large transformer model on a massive amount of unlabeled data and fine-tuning it on labeled datasets for diverse downstream tasks has proven to be a successful strategy, for a variety of vision and natural language processing tasks.
1 code implementation • 26 Aug 2022 • Jeffrey Willette, Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions.
1 code implementation • 30 Jun 2022 • Giung Nam, Hyungi Lee, Byeongho Heo, Juho Lee
Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments.
no code implementations • 20 May 2022 • Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
Recently, several task augmentation methods have been proposed to tackle this issue using domain-specific knowledge to design augmentation techniques to densify the meta-training task distribution.
no code implementations • 17 May 2022 • Hoil Lee, Fadhel Ayed, Paul Jung, Juho Lee, Hongseok Yang, François Caron
Under this model, we show that each layer of the infinite-width neural network can be characterised by two simple quantities: a non-negative scalar parameter and a L\'evy measure on the positive reals.
no code implementations • 2 Feb 2022 • Pranav Madadi, Jeongho Jeon, Joonyoung Cho, Caleb Lo, Juho Lee, Jianzhong Zhang
In multiple-input multiple-output (MIMO) systems, the high-resolution channel information (CSI) is required at the base station (BS) to ensure optimal performance, especially in the case of multi-user MIMO (MU-MIMO) systems.
no code implementations • NeurIPS 2021 • Giung Nam, Jongmin Yoon, Yoonho Lee, Juho Lee
We propose a simple approach for reducing this gap, i. e., making the distilled performance close to the full ensemble.
no code implementations • 12 Oct 2021 • Jeffrey Willette, Hae Beom Lee, Juho Lee, Sung Ju Hwang
Numerous recent works utilize bi-Lipschitz regularization of neural network layers to preserve relative distances between data instances in the feature spaces of each layer.
no code implementations • ICLR 2022 • Seanie Lee, Hae Beom Lee, Juho Lee, Sung Ju Hwang
Thanks to the gradients aligned between tasks by our method, the model becomes less vulnerable to negative transfer and catastrophic forgetting.
no code implementations • 29 Sep 2021 • Seungjae Jung, Min-Kyu Kim, Juho Lee, Young-Jin Park, Nahyeon Park, Kyung-Min Kim
Survival analysis appears in various fields such as medicine, economics, engineering, and business.
no code implementations • 29 Sep 2021 • Andreis Bruno, Seanie Lee, A. Tuan Nguyen, Juho Lee, Eunho Yang, Sung Ju Hwang
Deep Learning algorithms are designed to operate on huge volumes of high dimensional data such as images.
no code implementations • ICLR 2022 • Jeffrey Ryan Willette, Hae Beom Lee, Juho Lee, Sung Ju Hwang
Numerous recent works utilize bi-Lipschitz regularization of neural network layers to preserve relative distances between data instances in the feature spaces of each layer.
1 code implementation • ICLR 2022 • Hyungi Lee, Eunggu Yun, Hongseok Yang, Juho Lee
We show that simply introducing a scale prior on the last-layer parameters can turn infinitely-wide neural networks of any architecture into a richer class of stochastic processes.
1 code implementation • 11 Jun 2021 • Saehoon Kim, Sungwoong Kim, Juho Lee
On the other hand, the generative pre-training directly estimates the data distribution, so the representations tend to be robust but not optimal for discriminative tasks.
1 code implementation • 11 Jun 2021 • Jongmin Yoon, Sung Ju Hwang, Juho Lee
Recently, an Energy-Based Model (EBM) trained with Markov-Chain Monte-Carlo (MCMC) has been highlighted as a purification model, where an attacked image is purified by running a long Markov-chain using the gradients of the EBM.
no code implementations • 11 Jun 2021 • Jihoon Ko, Taehyung Kwon, Kijung Shin, Juho Lee
However, according to a recent study, a careful choice of pooling functions, which are used for the aggregation and readout operations in GNNs, is crucial for enabling GNNs to extrapolate.
1 code implementation • ACL 2021 • Seanie Lee, Minki Kang, Juho Lee, Sung Ju Hwang
QA models based on pretrained language mod-els have achieved remarkable performance on various benchmark datasets. However, QA models do not generalize well to unseen data that falls outside the training distribution, due to distributional shifts. Data augmentation (DA) techniques which drop/replace words have shown to be effective in regularizing the model from overfitting to the training data. Yet, they may adversely affect the QA tasks since they incur semantic changes that may lead to wrong answers for the QA task.
1 code implementation • CVPR 2021 • Jinwoo Kim, Jaehoon Yoo, Juho Lee, Seunghoon Hong
Generative modeling of set-structured data, such as point clouds, requires reasoning over local and global structures at various scales.
Ranked #1 on
Point Cloud Generation
on ShapeNet Car
no code implementations • NeurIPS 2021 • Bruno Andreis, Jeffrey Willette, Juho Lee, Sung Ju Hwang
The proposed method adheres to the required symmetries of invariance and equivariance as well as maintaining MBC for any partition of the input set.
no code implementations • 22 Feb 2021 • Jeffrey Willette, Juho Lee, Sung Ju Hwang
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
no code implementations • 1 Jan 2021 • Jeffrey Ryan Willette, Juho Lee, Sung Ju Hwang
We demonstrate the effectiveness of our method and validate its performance on both classification and regression problems by applying it to the training of recent state-of-the-art neural network models.
1 code implementation • ICCV 2021 • Yanbin Liu, Juho Lee, Linchao Zhu, Ling Chen, Humphrey Shi, Yi Yang
Most existing few-shot classification methods only consider generalization on one dataset (i. e., single-domain), failing to transfer across various seen and unseen domains.
no code implementations • pproximateinference AABI Symposium 2021 • Hyunsu Kim, Juho Lee, Hongseok Yang
The non-stationary kernel problem refers to the degraded performance of the algorithm due to the constant change of the transition kernel of the chain throughout the run of the algorithm.
2 code implementations • 29 Oct 2020 • Yueqi Wang, Yoonho Lee, Pallab Basu, Juho Lee, Yee Whye Teh, Liam Paninski, Ari Pakman
While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty.
1 code implementation • NeurIPS 2020 • Yoonho Lee, Juho Lee, Sung Ju Hwang, Eunho Yang, Seungjin Choi
While various complexity measures for deep neural networks exist, specifying an appropriate measure capable of predicting and explaining generalization in deep networks has proven challenging.
1 code implementation • NeurIPS 2020 • Juho Lee, Yoonho Lee, Jungtaek Kim, Eunho Yang, Sung Ju Hwang, Yee Whye Teh
While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely on an assumption that uncertainty in stochastic processes is modeled by a single latent variable, which potentially limits the flexibility.
no code implementations • 25 Jun 2020 • Bruno Andreis, Seanie Lee, A. Tuan Nguyen, Juho Lee, Eunho Yang, Sung Ju Hwang
Deep models are designed to operate on huge volumes of high dimensional data such as images.
2 code implementations • 9 Jun 2020 • Jay Heo, Junhyeon Park, Hyewon Jeong, Kwang Joon Kim, Juho Lee, Eunho Yang, Sung Ju Hwang
Moreover, it is almost infeasible for the human annotators to examine attentions on tons of instances and features.
no code implementations • IJCNLP 2019 • Kyungjae Lee, Sunghyun Park, Hojae Han, Jinyoung Yeo, Seung-won Hwang, Juho Lee
This paper studies the problem of supporting question answering in a new language with limited training resources.
no code implementations • 17 Oct 2019 • Tony Duan, Juho Lee
Generative models of graph structure have applications in biology and social sciences.
no code implementations • ICLR 2020 • Juho Lee, Yoonho Lee, Yee Whye Teh
We propose a deep amortized clustering (DAC), a neural architecture which learns to cluster datasets efficiently using a few forward passes.
no code implementations • 26 May 2019 • Juho Lee, Xenia Miscouridou, François Caron
In particular, we show that one can get novel series representations for the generalized gamma process and the stable beta process.
1 code implementation • 13 Feb 2019 • Fadhel Ayed, Juho Lee, François Caron
Bayesian nonparametric approaches, in particular the Pitman-Yor process and the associated two-parameter Chinese Restaurant process, have been successfully used in applications where the data exhibit a power-law behavior.
1 code implementation • 3 Oct 2018 • Juho Lee, Lancelot F. James, Seungjin Choi, François Caron
We consider a non-projective class of inhomogeneous random graph models with interpretable parameters and a number of interesting asymptotic properties.
6 code implementations • 1 Oct 2018 • Juho Lee, Yoonho Lee, Jungtaek Kim, Adam R. Kosiorek, Seungjin Choi, Yee Whye Teh
Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances.
no code implementations • 27 Sep 2018 • Juho Lee, Saehoon Kim, Jaehong Yoon, Hae Beom Lee, Eunho Yang, Sung Ju Hwang
With such input-independent dropout, each neuron is evolved to be generic across inputs, which makes it difficult to sparsify networks without accuracy loss.
2 code implementations • 5 Jun 2018 • Ingyo Chung, Saehoon Kim, Juho Lee, Kwang Joon Kim, Sung Ju Hwang, Eunho Yang
We present a personalized and reliable prediction model for healthcare, which can provide individually tailored medical services such as diagnosis, disease treatment, and prevention.
1 code implementation • 28 May 2018 • Juho Lee, Saehoon Kim, Jaehong Yoon, Hae Beom Lee, Eunho Yang, Sung Ju Hwang
With such input-independent dropout, each neuron is evolved to be generic across inputs, which makes it difficult to sparsify networks without accuracy loss.
2 code implementations • ICLR 2019 • Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, Yi Yang
The goal of few-shot learning is to learn a classifier that generalizes well even when trained with a limited number of training instances per class.
2 code implementations • NeurIPS 2018 • Jay Heo, Hae Beom Lee, Saehoon Kim, Juho Lee, Kwang Joon Kim, Eunho Yang, Sung Ju Hwang
Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them.
no code implementations • ICLR 2018 • Hae Beom Lee, Juho Lee, Eunho Yang, Sung Ju Hwang
Moreover, the learning of dropout probabilities for non-target classes on each instance allows the classifier to focus more on classification against the most confusing classes.
4 code implementations • NeurIPS 2018 • Hae Beom Lee, Juho Lee, Saehoon Kim, Eunho Yang, Sung Ju Hwang
Moreover, the learning of dropout rates for non-target classes on each instance allows the classifier to focus more on classification against the most confusing classes.
no code implementations • ICML 2017 • Juho Lee, Creighton Heaukulani, Zoubin Ghahramani, Lancelot F. James, Seungjin Choi
The BFRY random variables are well approximated by gamma random variables in a variational Bayesian inference routine, which we apply to several network datasets for which power law degree distributions are a natural assumption.
no code implementations • NeurIPS 2016 • Juho Lee, Lancelot F. James, Seungjin Choi
Bayesian nonparametric methods based on the Dirichlet process (DP), gamma process and beta process, have proven effective in capturing aspects of various datasets arising in machine learning.
no code implementations • NeurIPS 2015 • Juho Lee, Seungjin Choi
Normalized random measures (NRMs) provide a broad class of discrete random measures that are often used as priors for Bayesian nonparametric models.
no code implementations • 29 Jan 2015 • Juho Lee, Seungjin Choi
Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge.