To tackle this problem, previous work on code summarization, the task of automatically generating code description given a piece of code reported that an auxiliary learning model trained to produce API (Application Programming Interface) embeddings showed promising results when applied to a downstream, code summarization model.
To alleviate this problem, we propose DraftRec, a novel hierarchical model which recommends characters by considering each player's champion preferences and the interaction between the players.
While NeRF-based 3D-aware image generation methods enable viewpoint control, limitations still remain to be adopted to various 3D applications.
In order to perform unconditional video generation, we must learn the distribution of the real-world videos.
While recent NeRF-based generative models achieve the generation of diverse 3D-aware images, these approaches have limitations when generating images that contain user-specified characteristics.
Image-based virtual try-on provides the capacity to transfer a clothing item onto a photo of a given person, which is usually accomplished by warping the item to a given human pose and adjusting the warped item to the person.
In this paper, we present a large-scale animation celebfaces dataset (AnimeCeleb) via controllable synthetic animation models to boost research on the animation face domain.
During the fine-tuning phase of transfer learning, the pretrained vocabulary remains unchanged, while model parameters are updated.
Despite the unprecedented improvement of face recognition, existing face recognition models still show considerably low performances in determining whether a pair of child and adult images belong to the same identity.
Despite the impressive performance of deep networks in vision, language, and healthcare, unpredictable behaviors on samples from the distribution different than the training distribution cause severe problems in deployment.
Hence, we explore the practical setting called single positive setting, where each data instance is annotated by only one positive label with no explicit negative labels.
In this paper, we propose a novel distance-based BCR method suitable for OSR, which limits the feature space of known-class data in a class-wise manner and then makes background-class samples located far away from the limited feature space.
The former normalizes the input to fix its distribution in terms of the mean and variance, while the latter returns the output to the original distribution.
To address this, we introduce a new neural stochastic processes, Decoupled Kernel Neural Processes (DKNPs), which explicitly learn a separate mean and kernel function to directly model the covariance between output variables in a data-driven manner.
Although previous approaches pre-define the type of dataset bias to prevent the network from learning it, recognizing the bias type in the real dataset is often prohibitive.
Successful sequential recommendation systems rely on accurately capturing the user's short-term and long-term interest.
However, the distribution of max logits of each predicted class is significantly different from each other, which degrades the performance of identifying unexpected objects in urban-scene segmentation.
Ranked #5 on Anomaly Detection on Fishyscapes L&F
In multi-modal dialogue systems, it is important to allow the use of images as part of a multi-turn conversation.
Deep neural networks for automatic image colorization often suffer from the color-bleeding artifact, a problematic color spreading near the boundaries between adjacent objects.
To this end, our method learns the disentangled representation of (1) the intrinsic attributes (i. e., those inherently defining a certain class) and (2) bias attributes (i. e., peripheral attributes causing the bias), from a large number of bias-aligned samples, the bias attributes of which have strong correlation with the target variable.
While the existing cycle-consistency loss ensures that the image can be translated back, our approach makes the model further preserve the attribute-irrelevant regions even in a single translation to another domain by using the Grad-CAM output computed from the discriminator.
For example, it is difficult to figure out which models provide state-of-the-art performance, as recently proposed models have often been evaluated with different datasets and experiment environments.
Understanding voluminous historical records provides clues on the past in various aspects, such as social and political issues and even natural science facts.
The task of image-based virtual try-on aims to transfer a target clothing item onto the corresponding region of a person, which is commonly tackled by fitting the item to the desired body part and fusing the warped item with the person.
Enhancing the generalization capability of deep neural networks to unseen domains is crucial for safety-critical applications in the real world such as autonomous driving.
In response, we introduce a novel large-scale Korean hairstyle dataset, K-hairstyle, containing 500, 000 high-resolution images.
Learning visual representations using large-scale unlabelled images is a holy grail for most of computer vision tasks.
This paper considers neural networks as novel steganographic cover media, which we call stego networks, that can be used to hide one's secret messages.
We evaluate our method against existing ones in terms of the quality of generated questions as well as the fine-tuned MRC model accuracy after training on the data synthetically generated by our method.
Ranked #4 on Question Generation on SQuAD1.1 (using extra training data)
We evaluate the out-of-distribution (OOD) detection performance of self-supervised learning (SSL) techniques with a new evaluation framework.
Question Answering (QA) is a widely-used framework for developing and evaluating an intelligent machine.
To address this issue, this paper presents a novel meta-learning algorithm for unsupervised neural machine translation (UNMT) that trains the model to adapt to another domain by utilizing only a small amount of training data.
By interpreting the forward dynamics of the latent representation of neural networks as an ordinary differential equation, Neural Ordinary Differential Equation (Neural ODE) emerged as an effective framework for modeling a system dynamics in the continuous time domain.
Video generation models often operate under the assumption of fixed frame rates, which leads to suboptimal performance when it comes to handling flexible frame rates (e. g., increasing the frame rate of the more dynamic portion of the video as well as handling missing video frames).
HyperTendril takes a novel approach to effectively steering hyperparameter optimization through an iterative, interactive tuning procedure that allows users to refine the search spaces and the configuration of the AutoML method based on their own insights from given results.
This paper addresses the problem that pixel embedding in proposal-free instance segmentation based lane detection is difficult to optimize.
Ranked #9 on Lane Detection on TuSimple
However, it is difficult to prepare for a training data set that has a sufficient amount of semantically meaningful pairs of images as well as the ground truth for a colored image reflecting a given reference (e. g., coloring a sketch of an originally blue car given a reference green car).
Real-world question answering systems often retrieve potentially relevant documents to a given question through a keyword search, followed by a machine reading comprehension (MRC) step to find the exact answer from them.
This paper exploits the intrinsic features of urban-scene images and proposes a general add-on module, called height-driven attention networks (HANet), for improving semantic segmentation for urban-scene images.
Ranked #10 on Semantic Segmentation on Cityscapes test
Despite remarkable success in unpaired image-to-image translation, existing systems still require a large amount of labeled images.
Predicting road traffic speed is a challenging task due to different types of roads, abrupt speed change and spatial dependencies between roads; it requires the modeling of dynamically changing spatial dependencies among roads and temporal patterns over long input sequences.
Disentangling content and style information of an image has played an important role in recent success in image translation.
Here we describe a new NL2pSQL task to generate pSQL codes from natural language questions on under-specified database issues, NL2pSQL.
This tactic is feasible in many scenarios where it is much easier to define a set of biased representations than to define and quantify bias.
This paper proposes a novel generative model called PUGAN, which progressively synthesizes high-quality audio in a raw waveform.
We evaluate the question generation capability of our method by comparing the BLEU score with existing methods and test our method by fine-tuning the MRC model on the downstream MRC data after training on synthetic data.
Attention networks, a deep neural network architecture inspired by humans' attention mechanism, have seen significant success in image captioning, machine translation, and many other applications.
First, those methods extract style from an entire exemplar which includes noisy information, which impedes a translation model from properly extracting the intended style of the exemplar.
Despite recent advancements in deep learning-based automatic colorization, they are still limited when it comes to few-shot learning.
Recently, image-to-image translation has seen a significant success.
First, we use a content representation from the source domain conditioned on a style representation from the target domain.
However, applying this approach in image translation is computationally intensive and error-prone due to the expensive time complexity and its non-trivial backpropagation.
Machine reading comprehension helps machines learn to utilize most of the human knowledge written in the form of text.
Ranked #11 on Question Answering on TriviaQA
Predicting the time to the next event is an important task in various domains.
Machine comprehension question answering, which finds an answer to the question given a passage, involves high-level reasoning processes of understanding and tracking the relevant contents across various semantic units such as words, phrases, and sentences in a document.
Therefore, our design study aims to provide a visual analytics solution to increase interpretability and interactivity of RNNs via a joint effort of medical experts, artificial intelligence scientists, and visual analytics researchers.
Recently, generative adversarial networks (GANs) have shown promising performance in generating realistic images.
This paper proposes a novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette.
Recently, deep learning has been advancing the state of the art in artificial intelligence to a new level, and humans rely on artificial intelligence techniques more than ever.
To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model.
Ranked #1 on Image-to-Image Translation on RaFD (using extra training data)
MMGAN finds two manifolds representing the vector representations of real and fake images.
Our experimental results using source codes demonstrate that our proposed model is capable of accurately detecting simple buffer overruns.
Embedding and visualizing large-scale high-dimensional data in a two-dimensional space is an important problem since such visualization can reveal deep insights out of complex data.