1 code implementation • Findings (EMNLP) 2021 • Hwiyeol Jo, Jaeseo Lim, Byoung-Tak Zhang
We present a new form of ensemble method–Devil’s Advocate, which uses a deliberately dissenting model to force other submodels within the ensemble to better collaborate.
no code implementations • MML (ACL) 2022 • SeongJun Jung, Woo Suk Choi, SeongHo Choi, Byoung-Tak Zhang
Recent GAN-based text-to-image generation models have advanced that they can generate photo-realistic images matching semantically with descriptions.
Generative Adversarial Network Multi-lingual Text-to-Image Generation +2
no code implementations • NAACL (DLG4NLP) 2022 • Woo Suk Choi, Yu-Jung Heo, Dharani Punithan, Byoung-Tak Zhang
In this work, we propose the application of abstract meaning representation (AMR) based semantic parsing models to parse textual descriptions of a visual scene into scene graphs, which is the first work to the best of our knowledge.
no code implementations • 7 Oct 2024 • Seongjun Jeong, Gi-Cheon Kang, Joochan Kim, Byoung-Tak Zhang
VLN-CM is composed of four modules and predicts the direction and distance of the next movement at each step.
no code implementations • 23 Jul 2024 • Sujin jeon, Suyeon Shin, Byoung-Tak Zhang
Embodied Instruction Following (EIF) is a task of planning a long sequence of sub-goals given high-level natural language instructions, such as "Rinse a slice of lettuce and place on the white table next to the fork".
1 code implementation • 5 Jun 2024 • Inwoo Hwang, Yunhyeok Kwak, Suhyung Choi, Byoung-Tak Zhang, Sanghack Lee
Causal dynamics learning has recently emerged as a promising approach to enhancing robustness in reinforcement learning (RL).
1 code implementation • 2 Jun 2024 • Yunhyeok Kwak, Inwoo Hwang, Dooyoung Kim, Sanghack Lee, Byoung-Tak Zhang
Monte Carlo Tree Search (MCTS) has showcased its efficacy across a broad spectrum of decision-making problems.
1 code implementation • 12 May 2024 • Inwoo Hwang, Yunhyeok Kwak, Yeon-Ji Song, Byoung-Tak Zhang, Sanghack Lee
Conditional independence provides a way to understand causal relationships among the variables of interest.
no code implementations • 4 May 2024 • Hyunseo Kim, Hyeonseo Yang, Taekyung Kim, Yoonsung Kim, Jin-Hwa Kim, Byoung-Tak Zhang
CSV encapsulates the uncertainty of estimated scene appearance (e. g., color uncertainty) and estimated geometric information (e. g., surface).
no code implementations • 29 Apr 2024 • Yeon-Ji Song, Suhyung Choi, Jaein Kim, Jin-Hwa Kim, Byoung-Tak Zhang
Human perception involves discerning complex multi-object scenes into time-static object appearance (ie, size, shape, color) and time-varying object motion (ie, location, velocity, acceleration).
no code implementations • 21 Apr 2024 • Suyeon Shin, Sujin jeon, Junghyun Kim, Gi-Cheon Kang, Byoung-Tak Zhang
Embodied Instruction Following (EIF) is the task of executing natural language instructions by navigating and interacting with objects in 3D environments.
no code implementations • 22 Mar 2024 • Seongjun Jeong, Gi-Cheon Kang, SeongHo Choi, Joochan Kim, Byoung-Tak Zhang
For the training and evaluation of CVLN agents, we re-arrange existing VLN datasets to propose two datasets: CVLN-I, focused on navigation via initial-instruction interpretation, and CVLN-D, aimed at navigation through dialogue with other agents.
no code implementations • 11 Mar 2024 • Junseok Park, Yoonsung Kim, Hee Bin Yoo, Min Whoo Lee, Kibeom Kim, Won-Seok Choi, Minsu Lee, Byoung-Tak Zhang
Toddlers evolve from free exploration with sparse feedback to exploiting prior experiences for goal-directed learning with denser rewards.
no code implementations • 6 Mar 2024 • Youngjae Yoo, Chung-Yeon Lee, Byoung-Tak Zhang
The experimental results verified that the proposed framework reliably detects anomalies in object slip situations despite various object types and robot behaviors, and visual and auditory noise in the environment.
no code implementations • 14 Feb 2024 • Won-Seok Choi, Hyundo Lee, Dong-Sig Han, Junseok Park, Heeyeon Koo, Byoung-Tak Zhang
Recent machine learning algorithms have been developed using well-curated datasets, which often require substantial cost and resources.
no code implementations • 5 Dec 2023 • Kibeom Kim, Kisung Shin, Min Whoo Lee, Moonhoen Lee, Minsu Lee, Byoung-Tak Zhang
Interactive visual navigation tasks, which involve following instructions to reach and interact with specific targets, are challenging not only because successful experiences are very rare but also because the complex visual inputs require a substantial number of samples.
1 code implementation • ICCV 2023 • Ganghun Lee, Minji Kim, Yunsu Lee, Minsu Lee, Byoung-Tak Zhang
Collage is a creative art form that uses diverse material scraps as a base unit to compose a single image.
1 code implementation • 19 Oct 2023 • Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Seoyun Yang, Minjoon Jung, Byoung-Tak Zhang
Based on the acquired information, PGA pseudo-labels objects in the Reminiscence by our proposed label propagation algorithm.
1 code implementation • 14 Sep 2023 • Gi-Cheon Kang, Junghyun Kim, Jaein Kim, Byoung-Tak Zhang
The robot should then identify the target object by interacting with a human user.
1 code implementation • 12 Jul 2023 • Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Suyeon Shin, Byoung-Tak Zhang
Furthermore, the qualitative analysis shows that the unadapted VG model often fails to find correct objects due to a strong bias learned from the pre-training data.
1 code implementation • 8 Jun 2023 • Hyunseo Kim, Hye Jung Yoon, Minji Kim, Dong-Sig Han, Byoung-Tak Zhang
We evaluate our method on the first-person video benchmark dataset, TREK-150, and on the custom dataset, RMOT-223, that we collect from the UR5e robot.
1 code implementation • 5 Jun 2023 • Minjoon Jung, Youwon Jang, SeongHo Choi, Joochan Kim, Jin-Hwa Kim, Byoung-Tak Zhang
Video moment retrieval (VMR) identifies a specific moment in an untrimmed video for a given natural language query.
Ranked #9 on Moment Retrieval on Charades-STA
no code implementations • 23 May 2023 • Kibeom Kim, Hyundo Lee, Min Whoo Lee, Moonheon Lee, Minsu Lee, Byoung-Tak Zhang
Tasks that involve interaction with various targets are called multi-target tasks.
no code implementations • CVPR 2023 • Hyundo Lee, Inwoo Hwang, Hyunsung Go, Won-Seok Choi, Kibeom Kim, Byoung-Tak Zhang
Our method, coined Learning by Sketching (LBS), learns to convert an image into a set of colored strokes that explicitly incorporate the geometric information of the scene in a single inference step without requiring a sketch dataset.
1 code implementation • 4 Nov 2022 • Inwoo Hwang, Sangjun Lee, Yunhyeok Kwak, Seong Joon Oh, Damien Teney, Jin-Hwa Kim, Byoung-Tak Zhang
Experiments on standard benchmarks demonstrate the effectiveness of the method, in particular when label noise complicates the identification of bias-conflicting examples.
no code implementations • 31 Oct 2022 • Won-Seok Choi, Dong-Sig Han, Hyundo Lee, Junseok Park, Byoung-Tak Zhang
In Self-Supervised Learning (SSL), it is known that frequent occurrences of the collision in which target data and its negative samples share the same class can decrease performance.
1 code implementation • 23 Oct 2022 • Minjoon Jung, SeongHo Choi, Joochan Kim, Jin-Hwa Kim, Byoung-Tak Zhang
Video corpus moment retrieval (VCMR) is the task to retrieve the most relevant video moment from a large video corpus using a natural language query.
Ranked #2 on Video Corpus Moment Retrieval on TVR
no code implementations • 20 Oct 2022 • Dong-Sig Han, Hyunseo Kim, Hyundo Lee, Je-Hwan Ryu, Byoung-Tak Zhang
Recently, adversarial imitation learning has shown a scalable reward acquisition method for inverse reinforcement learning (IRL) problems.
no code implementations • 17 Oct 2022 • Woo Suk Choi, Yu-Jung Heo, Byoung-Tak Zhang
To this end, we design a simple yet effective two-stage scene graph parsing framework utilizing abstract meaning representation, SGRAM (Scene GRaph parsing via Abstract Meaning representation): 1) transforming a textual description of an image into an AMR graph (Text-to-AMR) and 2) encoding the AMR graph into a Transformer-based language model to generate a scene graph (AMR-to-SG).
no code implementations • 22 Sep 2022 • Seonil Son, Jaeseo Lim, Youwon Jang, Jaeyoung Lee, Byoung-Tak Zhang
We compare our approach with Unlikelihood (UL) training in a text continuation task on commonsense natural language inference (NLI) corpora to show which method better models the coherence by avoiding unlikely continuations.
no code implementations • 9 Aug 2022 • Junseok Park, Inwoo Hwang, Min Whoo Lee, Hyunseok Oh, Minsu Lee, Youngki Lee, Byoung-Tak Zhang
The initial years of an infant's life are known as the critical period, during which the overall development of learning performance is significantly impacted due to neural plasticity.
1 code implementation • 9 Aug 2022 • Ganghun Lee, Minji Kim, Minsu Lee, Byoung-Tak Zhang
We present an automated learning framework for a robotic sketching agent that is capable of learning stroke-based rendering and motor control simultaneously.
Hierarchical Reinforcement Learning reinforcement-learning +1
no code implementations • 31 Jul 2022 • Taehyeong Kim, Hyeonseop Song, Byoung-Tak Zhang
Additionally, we also propose an aligned cross-modal representation learning method that learns semantic representations of visual objects and words in a self-supervised manner based on the cross-modal relational graph networks.
2 code implementations • CVPR 2023 • Gi-Cheon Kang, Sungdong Kim, Jin-Hwa Kim, Donghyun Kwak, Byoung-Tak Zhang
As a result, GST scales the amount of training data up to an order of magnitude that of VisDial (1. 2M to 12. 9M QA data).
Conditional Text Generation Out-of-Distribution Detection +1
1 code implementation • ACL 2022 • Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Byoung-Tak Zhang
Knowledge-based visual question answering (QA) aims to answer a question which requires visually-grounded external knowledge beyond image content itself.
no code implementations • 12 Jan 2022 • Junseok Park, Kwanyoung Park, Hyunseok Oh, Ganghun Lee, Minsu Lee, Youngki Lee, Byoung-Tak Zhang
To validate this hypothesis, we adapt this notion of critical periods to learning in AI agents and investigate the critical period in the virtual environment for AI agents.
no code implementations • CVPR 2022 • Jiseob Kim, Jihoon Lee, Byoung-Tak Zhang
Face-swapping models have been drawing attention for their compelling generation quality, but their complex architectures and loss functions often require careful tuning for successful training.
1 code implementation • NeurIPS 2021 • Kibeom Kim, Min Whoo Lee, Yoonsung Kim, Je-Hwan Ryu, Minsu Lee, Byoung-Tak Zhang
Learning in a multi-target environment without prior knowledge about the targets requires a large amount of samples and makes generalization difficult.
no code implementations • 8 Oct 2021 • Yu-Jung Heo, Minsu Lee, SeongHo Choi, Woo Suk Choi, Minjung Shin, Minjoon Jung, Jeh-Kwang Ryu, Byoung-Tak Zhang
In this paper, we propose the Video Turing Test to provide effective and practical assessments of video understanding intelligence as well as human-likeness evaluation of AI agents.
no code implementations • 11 Aug 2021 • Donggeon Lee, SeongHo Choi, Youwon Jang, Byoung-Tak Zhang
In this paper, we challenge the existing multiple-choice video question answering by changing it to open-ended video question answering.
no code implementations • 21 Jul 2021 • Minjung Shin, SeongHo Choi, Yu-Jung Heo, Minsu Lee, Byoung-Tak Zhang, Jeh-Kwang Ryu
We introduce CogME, a cognition-inspired, multi-dimensional evaluation metric designed for AI models focusing on story understanding.
1 code implementation • ACL 2021 • Ahjeong Seo, Gi-Cheon Kang, Joonhan Park, Byoung-Tak Zhang
MASN consists of a motion module, an appearance module, and a motion-appearance fusion module.
no code implementations • 31 Jan 2021 • Kyung-Wha Park, Jung-Woo Ha, Junghoon Lee, Sunyoung Kwon, Kyung-Min Kim, Byoung-Tak Zhang
Assessing advertisements, specifically on the basis of user preferences and ad quality, is crucial to the marketing industry.
no code implementations • 27 Jan 2021 • Kwanyoung Park, Junseok Park, Hyunseok Oh, Byoung-Tak Zhang, Youngki Lee
One of the inherent limitations of current AI systems, stemming from the passive learning mechanisms (e. g., supervised learning), is that they perform well on labeled datasets but cannot deduce knowledge on their own.
no code implementations • 1 Jan 2021 • Dong-Sig Han, Hyunseo Kim, Hyundo Lee, Je-Hwan Ryu, Byoung-Tak Zhang
The formulation draws a strong connection between adversarial learning and energy-based reinforcement learning; thus, the architecture is capable of recovering a reward function that induces a multi-modal policy.
no code implementations • 1 Jan 2021 • Jiseob Kim, Seungjae Jung, Hyundo Lee, Byoung-Tak Zhang
One of the difficulties in modeling real-world data is their complex multi-manifold structure due to discrete features.
no code implementations • 1 Jan 2021 • Hwiyeol Jo, Byoung-Tak Zhang
Through the re-training process, some of noises can be compensated and other noises can be utilized to learn better representations.
no code implementations • 1 Jan 2021 • Il-Jae Kwon, Kyoung-Woon On, Dong-Geon Lee, Byoung-Tak Zhang
Most real-world graphs are dynamic and eventually face the cold start problem.
no code implementations • 1 Jan 2021 • Kyoung-Woon On, Eun-Sol Kim, Il-Jae Kwon, Sangwoong Yoon, Byoung-Tak Zhang
To further investigate the effectiveness of our proposed method, we evaluate our approach on a real-world problem, image retrieval with visual scene graphs.
no code implementations • 2 Dec 2020 • Taehyeong Kim, Injune Hwang, Hyundo Lee, Hyunseo Kim, Won-Seok Choi, Joseph J. Lim, Byoung-Tak Zhang
Active learning is widely used to reduce labeling effort and training time by repeatedly querying only the most beneficial samples from unlabeled data.
no code implementations • 7 Nov 2020 • Jaeseo Lim, Hwiyeol Jo, Byoung-Tak Zhang, Jooyong Park
In the end, we showed not only that we can make build better machine training framework through the human experiment result, but also empirically confirm the result of human experiment through imitated machine experiments; human-like active learning have crucial effect on learning performance.
no code implementations • 27 Oct 2020 • Björn Bebensee, Byoung-Tak Zhang
Inspired by recent trends in vision and language learning, we explore applications of attention mechanisms for visio-lingual fusion within an application to story-based video understanding.
no code implementations • WS 2020 • Woo Suk Choi, Kyoung-Woon On, Yu-Jung Heo, Byoung-Tak Zhang
In experiment, the integrated scene graph is applied to the image-caption retrieval task as a down-stream task.
no code implementations • 28 May 2020 • Dharani Punithan, Byoung-Tak Zhang
We propose an in silico molecular associative memory model for pattern learning, storage and denoising using Pairwise Markov Random Field (PMRF) model.
1 code implementation • 7 May 2020 • Seong-Ho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang
Despite recent progress on computer vision and natural language processing, developing a machine that can understand video story is still hard to achieve due to the intrinsic difficulty of video story.
1 code implementation • Findings (EMNLP) 2021 • Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang, Jin-Hwa Kim
Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context.
no code implementations • 17 Jan 2020 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video.
no code implementations • 8 Nov 2019 • Hwiyeol Jo, Byoung-Tak Zhang
Next, we gradually add random noises to the word representations and repeat the training process from scratch, but initialize with the noised word representations.
no code implementations • 6 Oct 2019 • Kyung-Wha Park, Junghoon Lee, Sunyoung Kwon, Jung-Woo Ha, Kyung-Min Kim, Byoung-Tak Zhang
Despite crucial influences of image quality, auxiliary information of ad images such as tags and target subjects can also determine image preference.
no code implementations • 25 Sep 2019 • Jiseob Kim, Seungjae Jung, Hyundo Lee, Byoung-Tak Zhang
We present a generative adversarial network (GAN) that conducts manifold learning and alignment (MLA): A task to learn the multi-manifold structure underlying data and to align those manifolds without any correspondence information.
no code implementations • 25 Sep 2019 • Woo-Young Kang, Cheol-Ho Han, Byoung-Tak Zhang
Generative replay (GR) is a method to alleviate catastrophic forgetting in continual learning (CL) by generating previous task data and learning them together with the data from new tasks.
no code implementations • 3 Jul 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods.
no code implementations • 3 Jun 2019 • Jiseob Kim, Seungjae Jung, Hyundo Lee, Byoung-Tak Zhang
We present an encoder-powered generative adversarial network (EncGAN) that is able to learn both the multi-manifold structure and the abstract features of data.
no code implementations • 9 May 2019 • Sungjae Cho, Jaeseo Lim, Chris Hickey, Jung Ae Park, Byoung-Tak Zhang
Problem difficulty was operationalized by the number of carries involved in solving a given problem.
no code implementations • 1 Apr 2019 • Yu-Jung Heo, Kyoung-Woon On, SeongHo Choi, Jaeseo Lim, Jinah Kim, Jeh-Kwang Ryu, Byung-Chull Bae, Byoung-Tak Zhang
Video understanding is emerging as a new paradigm for studying human-like AI.
2 code implementations • IJCNLP 2019 • Gi-Cheon Kang, Jaeseo Lim, Byoung-Tak Zhang
Specifically, REFER module learns latent relationships between a given question and a dialog history by employing a self-attention mechanism.
Ranked #2 on Visual Dialog on VisDial v0.9 val
no code implementations • 20 Jan 2019 • Jiseob Kim, Byoung-Tak Zhang
Exploiting the deep generative model's remarkable ability of learning the data-manifold structure, some recent researches proposed a geometric data interpolation method based on the geodesic curves on the learned data-manifold.
no code implementations • 20 Jan 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies.
no code implementations • ECCV 2018 • Kyung-Min Kim, Seong-Ho Choi, Jin-Hwa Kim, Byoung-Tak Zhang
We confirm the best performance of the dual attention mechanism combined with late fusion by ablation studies.
1 code implementation • 28 May 2018 • Taehyeong Kim, Min-Oh Heo, Seonil Son, Kyoung-Wha Park, Byoung-Tak Zhang
The task of multi-image cued story generation, such as visual storytelling dataset (VIST) challenge, is to compose multiple coherent sentences from a given sequence of images.
Ranked #30 on Visual Storytelling on VIST (METEOR metric)
8 code implementations • NeurIPS 2018 • Jin-Hwa Kim, Jaehyun Jun, Byoung-Tak Zhang
In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly.
Ranked #10 on Phrase Grounding on Flickr30k Entities Test
1 code implementation • NeurIPS 2018 • Sang-Woo Lee, Yu-Jung Heo, Byoung-Tak Zhang
Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take.
no code implementations • ICLR 2018 • Hanock Kwak, Byoung-Tak Zhang
The parameter domain of the loss surface can be decomposed into regions in which activation values (zero or one for rectified linear units) are consistent.
no code implementations • 18 Dec 2017 • Jin-Hwa Kim, Byoung-Tak Zhang
Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function of visual question answering tasks, implicitly performs an attentional mechanism for visual inputs.
2 code implementations • ACL 2019 • Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh
The game involves two players: a Teller and a Drawer.
no code implementations • 13 Dec 2017 • Jinyoung Choi, Beom-Jin Lee, Byoung-Tak Zhang
In multi-agent cooperative task experiments, our model shows 20% faster learning than existing state-of-the-art model.
no code implementations • 4 Jul 2017 • Kyung-Min Kim, Min-Oh Heo, Seong-Ho Choi, Byoung-Tak Zhang
This is mainly due to 1) the reconstruction of video stories in a scene-dialogue combined form that utilize the latent embedding and 2) attention.
1 code implementation • NeurIPS 2017 • Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, Byoung-Tak Zhang
Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task.
no code implementations • 11 Mar 2017 • Sungtae Lee, Sang-Woo Lee, Jinyoung Choi, Dong-Hyun Kwak, Byoung-Tak Zhang
To solve this issue, the subgoal and option framework have been proposed.
no code implementations • 4 Nov 2016 • Hanock Kwak, Byoung-Tak Zhang
The GANs are generative models whose random samples realistically reflect natural images.
8 code implementations • 14 Oct 2016 • Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang
Bilinear models provide rich representations compared with linear models.
no code implementations • 7 Sep 2016 • Jinyoung Choi, Beom-Jin Lee, Byoung-Tak Zhang
However, in most of the service robot applications, the user needs to move himself/herself to allow the robot to see him/her face to face.
no code implementations • 19 Jul 2016 • Hanock Kwak, Byoung-Tak Zhang
We propose a model called composite generative adversarial network, that reveals the complex structure of images with multiple generators in which each generator generates some part of the image.
1 code implementation • NeurIPS 2016 • Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang
We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning.
no code implementations • 15 Jun 2015 • Sang-Woo Lee, Min-Oh Heo, Jiwon Kim, Jeonghee Kim, Byoung-Tak Zhang
The proposed architecture consists of deep representation learners and fast learnable shallow kernel networks, both of which synergize to track the information of new data.
no code implementations • NeurIPS 2010 • Yung-Kyun Noh, Byoung-Tak Zhang, Daniel D. Lee
We consider the problem of learning a local metric to enhance the performance of nearest neighbor classification.