While Transformers have had significant success in paragraph generation, they treat sentences as linear sequences of tokens and often neglect their hierarchical information.
Here we explore the efficacy of dense supervision in unconditional generation and find generator feature maps can be an alternative of cost-expensive semantic label maps.
Understanding temporal dynamics of video is an essential aspect of learning better video representations.
Then, the sparse attention is applied to the node sequences for learning node representations with a reduced computational cost.
In this work, we propose a new evaluation metric, called `rarity score', to measure the individual rarity of each image synthesized by generative models.
The great success of machine learning with massive amounts of data comes at a price of huge computation costs and storage for training and tuning.
These models have shown a significant increase in inference speed, but at the cost of lower QA performance compared to the retriever-reader models.
Many recent studies on large-scale language models have reported successful in-context zero- and few-shot learning ability.
A large body of continual learning (CL) methods, however, assumes data streams with clean labels, and online learning scenarios under noisy data streams are yet underexplored.
We also propose a simple and effective semi-supervised learning strategy with generated samples from MH-Aug. Our extensive experiments demonstrate that MH-Aug can generate a sequence of samples according to the target distribution to significantly improve the performance of GNNs.
In this paper, we found that the recent emerging paradigm of implicit neural representations (INRs) that encodes a continuous signal into a parameterized neural network effectively mitigates the issue.
Specifically, we map the input of a generator, which was sampled from the categorical distribution, to the embedding space of the discriminator and let them act as a cluster centroid.
Here we explore the possibility of general-purpose user representation learning by training a universal user encoder at large scales.
For better practicality, we first propose a novel continual learning setup that is online, task-free, class-incremental, of blurry task boundaries and subject to inference queries at any moment.
Specifically, we argue the importance of both diversity and purity of examples in the episodic memory of continual learning models.
2 code implementations • • Boseop Kim, HyoungSeok Kim, Sang-Woo Lee, Gichang Lee, Donghyun Kwak, Dong Hyeon Jeon, Sunghyun Park, Sungju Kim, Seonhoon Kim, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park, Jinuk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo, Minsuk Chang, Soobin Suh, Sookyo In, Jinseong Park, Kyungduk Kim, Hiun Kim, Jisu Jeong, Yong Goo Yeo, Donghoon Ham, Dongju Park, Min Young Lee, Jaewook Kang, Inho Kang, Jung-Woo Ha, WooMyoung Park, Nako Sung
GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data.
3 code implementations • 20 May 2021 • Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Jiyoon Han, Jangwon Park, Chisung Song, JunSeong Kim, Yongsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, InKwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jung-Woo Ha, Kyunghyun Cho
We introduce Korean Language Understanding Evaluation (KLUE) benchmark.
Results show that the reward optimization with BLEURT is able to increase the metric scores by a large margin, in contrast to limited gain when training with smoothed BLEU.
Prevalent scenario of continual learning, however, assumes disjoint sets of classes as tasks and is less realistic rather artificial.
Our method is learning to learn a primary task with various auxiliary tasks to improve generalization performance.
Assessing advertisements, specifically on the basis of user preferences and ad quality, is crucial to the marketing industry.
Recent advances in pre-trained language models have significantly improved neural response generation.
With experiments on reading comprehension, we show that BLANC outperforms the state-of-the-art QA models, and the performance gap increases as the number of answer text occurrences increases.
Language model pre-training has shown promising results in various downstream tasks.
Label noise is a critical factor that degrades the generalization performance of deep neural networks, thus leading to severe issues in real-world problems.
Ranked #15 on Image Classification on Clothing1M (using extra training data)
Our proposed method is learning to learn a primary task by predicting meta-paths as auxiliary tasks.
no code implementations • 6 Jul 2020 • Sang-Woo Lee, Hyunhoon Jung, SukHyun Ko, Sunyoung Kim, Hyewon Kim, Kyoungtae Doh, Hyunjung Park, Joseph Yeo, Sang-Houn Ok, Joonhaeng Lee, Sungsoon Lim, Minyoung Jeong, Seongjae Choi, SeungTae Hwang, Eun-Young Park, Gwang-Ja Ma, Seok-Joo Han, Kwang-Seung Cha, Nako Sung, Jung-Woo Ha
Tracking suspected cases of COVID-19 is crucial to suppressing the spread of COVID-19 pandemic.
The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models.
Because of the scale invariance, this modification only alters the effective step sizes without changing the effective update directions, thus enjoying the original convergence properties of GD optimizers.
1 code implementation • 20 Apr 2020 • Jung-Woo Ha, Kihyun Nam, Jingu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Eunmi Kim, Hyeji Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim
Automatic speech recognition (ASR) via call is essential for various applications, including AI for contact center (AICC) services.
Musical onset detection can be formulated as a time-to-event (TTE) or time-since-event (TSE) prediction task by defining music as a sequence of onset events.
A good image-to-image translation model should learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains.
Ranked #1 on Image-to-Image Translation on AFHQ
Here we describe a new NL2pSQL task to generate pSQL codes from natural language questions on under-specified database issues, NL2pSQL.
We first assume that the priors of future samples can be generated in an independently and identically distributed (i. i. d.)
Despite crucial influences of image quality, auxiliary information of ad images such as tags and target subjects can also determine image preference.
The oversmoothing of GNNs is an obstacle of GNN-based social recommendation as well.
The checkerboard phenomenon is one of the well-known visual artifacts in the computer vision field.
Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction.
Answerer in Questioner's Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented dialog systems.
We present a hybrid framework that leverages the trade-off between temporal and frequency precision in audio representations to improve the performance of speech enhancement task.
Audio and Speech Processing Sound
Many hyperparameter optimization (HyperOpt) methods assume restricted computing resources and mainly focus on enhancing performance.
The boom of deep learning induced many industries and academies to introduce machine learning based approaches into their concern, competitively.
Predicting the time to the next event is an important task in various domains.
A recommender system aims to recommend items that a user is interested in among many items.
However, researchers are still required to perform a non-trivial amount of manual tasks such as GPU allocation, training status tracking, and comparison of models with different hyperparameter settings.
Predicting highrisk vascular diseases is a significant issue in the medical domain.
To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model.
Ranked #1 on Image-to-Image Translation on RaFD (using extra training data)
In this paper, we present a supervised feature learning approach using artist labels annotated in every single track as objective meta data.
Sound Audio and Speech Processing
Recommender systems aim to find an accurate and efficient mapping from historic data of user-preferred items to a new item that is to be liked by a user.
Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task.
We propose Dual Attention Networks (DANs) which jointly leverage visual and textual attention mechanisms to capture fine-grained interplay between vision and language.
Ranked #2 on Visual Question Answering on VQA v1 test-dev
We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning.