Search Results for author: Byoung-Tak Zhang

Found 78 papers, 24 papers with code

Scene Graph Parsing via Abstract Meaning Representation in Pre-trained Language Models

no code implementations NAACL (DLG4NLP) 2022 Woo Suk Choi, Yu-Jung Heo, Dharani Punithan, Byoung-Tak Zhang

In this work, we propose the application of abstract meaning representation (AMR) based semantic parsing models to parse textual descriptions of a visual scene into scene graphs, which is the first work to the best of our knowledge.

AMR Parsing Dependency Parsing

Devil’s Advocate: Novel Boosting Ensemble Method from Psychological Findings for Text Classification

1 code implementation Findings (EMNLP) 2021 Hwiyeol Jo, Jaeseo Lim, Byoung-Tak Zhang

We present a new form of ensemble method–Devil’s Advocate, which uses a deliberately dissenting model to force other submodels within the ensemble to better collaborate.

text-classification Text Classification

Continual Vision-and-Language Navigation

no code implementations22 Mar 2024 Seongjun Jeong, Gi-Cheon Kang, SeongHo Choi, Joochan Kim, Byoung-Tak Zhang

For the training and evaluation of CVLN agents, we re-arrange existing VLN datasets to propose two datasets: CVLN-I, focused on navigation via initial-instruction interpretation, and CVLN-D, aimed at navigation through dialogue with other agents.

Continual Learning Navigate +1

Multimodal Anomaly Detection based on Deep Auto-Encoder for Object Slip Perception of Mobile Manipulation Robots

no code implementations6 Mar 2024 Youngjae Yoo, Chung-Yeon Lee, Byoung-Tak Zhang

The experimental results verified that the proposed framework reliably detects anomalies in object slip situations despite various object types and robot behaviors, and visual and auditory noise in the environment.

Anomaly Detection Object

DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning

no code implementations14 Feb 2024 Won-Seok Choi, Hyundo Lee, Dong-Sig Han, Junseok Park, Heeyeon Koo, Byoung-Tak Zhang

Recent machine learning algorithms have been developed using well-curated datasets, which often require substantial cost and resources.

Visual Hindsight Self-Imitation Learning for Interactive Navigation

no code implementations5 Dec 2023 Kibeom Kim, Kisung Shin, Min Whoo Lee, Moonhoen Lee, Minsu Lee, Byoung-Tak Zhang

Interactive visual navigation tasks, which involve following instructions to reach and interact with specific targets, are challenging not only because successful experiences are very rare but also because the complex visual inputs require a substantial number of samples.

Imitation Learning Visual Navigation

Neural Collage Transfer: Artistic Reconstruction via Material Manipulation

1 code implementation ICCV 2023 Ganghun Lee, Minji Kim, Yunsu Lee, Minsu Lee, Byoung-Tak Zhang

Collage is a creative art form that uses diverse material scraps as a base unit to compose a single image.

PGA: Personalizing Grasping Agents with Single Human-Robot Interaction

1 code implementation19 Oct 2023 Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Seoyun Yang, Minjoon Jung, Byoung-Tak Zhang

Based on the acquired information, PGA pseudo-labels objects in the Reminiscence by our proposed label propagation algorithm.

Object Robotic Grasping

GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

1 code implementation12 Jul 2023 Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Suyeon Shin, Byoung-Tak Zhang

Furthermore, the qualitative analysis shows that the unadapted VG model often fails to find correct objects due to a strong bias learned from the pre-training data.

Object Detection Visual Grounding

EXOT: Exit-aware Object Tracker for Safe Robotic Manipulation of Moving Object

1 code implementation8 Jun 2023 Hyunseo Kim, Hye Jung Yoon, Minji Kim, Dong-Sig Han, Byoung-Tak Zhang

We evaluate our method on the first-person video benchmark dataset, TREK-150, and on the custom dataset, RMOT-223, that we collect from the UR5e robot.

Object Object Recognition

Learning Geometry-aware Representations by Sketching

no code implementations CVPR 2023 Hyundo Lee, Inwoo Hwang, Hyunsung Go, Won-Seok Choi, Kibeom Kim, Byoung-Tak Zhang

Our method, coined Learning by Sketching (LBS), learns to convert an image into a set of colored strokes that explicitly incorporate the geometric information of the scene in a single inference step without requiring a sketch dataset.

Attribute Semantic Similarity +1

SelecMix: Debiased Learning by Contradicting-pair Sampling

1 code implementation4 Nov 2022 Inwoo Hwang, Sangjun Lee, Yunhyeok Kwak, Seong Joon Oh, Damien Teney, Jin-Hwa Kim, Byoung-Tak Zhang

Experiments on standard benchmarks demonstrate the effectiveness of the method, in particular when label noise complicates the identification of bias-conflicting examples.

DUEL: Adaptive Duplicate Elimination on Working Memory for Self-Supervised Learning

no code implementations31 Oct 2022 Won-Seok Choi, Dong-Sig Han, Hyundo Lee, Junseok Park, Byoung-Tak Zhang

In Self-Supervised Learning (SSL), it is known that frequent occurrences of the collision in which target data and its negative samples share the same class can decrease performance.

Self-Supervised Learning

Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval

1 code implementation23 Oct 2022 Minjoon Jung, SeongHo Choi, Joochan Kim, Jin-Hwa Kim, Byoung-Tak Zhang

Video corpus moment retrieval (VCMR) is the task to retrieve the most relevant video moment from a large video corpus using a natural language query.

Moment Retrieval Multimodal Reasoning +3

Robust Imitation via Mirror Descent Inverse Reinforcement Learning

no code implementations20 Oct 2022 Dong-Sig Han, Hyunseo Kim, Hyundo Lee, Je-Hwan Ryu, Byoung-Tak Zhang

Recently, adversarial imitation learning has shown a scalable reward acquisition method for inverse reinforcement learning (IRL) problems.

Density Estimation Imitation Learning +2

SGRAM: Improving Scene Graph Parsing via Abstract Meaning Representation

no code implementations17 Oct 2022 Woo Suk Choi, Yu-Jung Heo, Byoung-Tak Zhang

To this end, we design a simple yet effective two-stage scene graph parsing framework utilizing abstract meaning representation, SGRAM (Scene GRaph parsing via Abstract Meaning representation): 1) transforming a textual description of an image into an AMR graph (Text-to-AMR) and 2) encoding the AMR graph into a Transformer-based language model to generate a scene graph (AMR-to-SG).

Dependency Parsing Graph Generation +5

Learning to Write with Coherence From Negative Examples

no code implementations22 Sep 2022 Seonil Son, Jaeseo Lim, Youwon Jang, Jaeyoung Lee, Byoung-Tak Zhang

We compare our approach with Unlikelihood (UL) training in a text continuation task on commonsense natural language inference (NLI) corpora to show which method better models the coherence by avoiding unlikely continuations.

Natural Language Inference Sentence +1

On the Importance of Critical Period in Multi-stage Reinforcement Learning

no code implementations9 Aug 2022 Junseok Park, Inwoo Hwang, Min Whoo Lee, Hyunseok Oh, Minsu Lee, Youngki Lee, Byoung-Tak Zhang

The initial years of an infant's life are known as the critical period, during which the overall development of learning performance is significantly impacted due to neural plasticity.

reinforcement-learning Reinforcement Learning (RL)

From Scratch to Sketch: Deep Decoupled Hierarchical Reinforcement Learning for Robotic Sketching Agent

1 code implementation9 Aug 2022 Ganghun Lee, Minji Kim, Minsu Lee, Byoung-Tak Zhang

We present an automated learning framework for a robotic sketching agent that is capable of learning stroke-based rendering and motor control simultaneously.

Hierarchical Reinforcement Learning reinforcement-learning +1

Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

no code implementations31 Jul 2022 Taehyeong Kim, Hyeonseop Song, Byoung-Tak Zhang

Additionally, we also propose an aligned cross-modal representation learning method that learns semantic representations of visual objects and words in a self-supervised manner based on the cross-modal relational graph networks.

Representation Learning Zero-Shot Learning

Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering

1 code implementation ACL 2022 Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Byoung-Tak Zhang

Knowledge-based visual question answering (QA) aims to answer a question which requires visually-grounded external knowledge beyond image content itself.

Question Answering Visual Question Answering

Toddler-Guidance Learning: Impacts of Critical Period on Multimodal AI Agents

no code implementations12 Jan 2022 Junseok Park, Kwanyoung Park, Hyunseok Oh, Ganghun Lee, Minsu Lee, Youngki Lee, Byoung-Tak Zhang

To validate this hypothesis, we adapt this notion of critical periods to learning in AI agents and investigate the critical period in the virtual environment for AI agents.

Reinforcement Learning (RL) Transfer Learning

Smooth-Swap: A Simple Enhancement for Face-Swapping with Smoothness

no code implementations CVPR 2022 Jiseob Kim, Jihoon Lee, Byoung-Tak Zhang

Face-swapping models have been drawing attention for their compelling generation quality, but their complex architectures and loss functions often require careful tuning for successful training.

Contrastive Learning Face Swapping

Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning

1 code implementation NeurIPS 2021 Kibeom Kim, Min Whoo Lee, Yoonsung Kim, Je-Hwan Ryu, Minsu Lee, Byoung-Tak Zhang

Learning in a multi-target environment without prior knowledge about the targets requires a large amount of samples and makes generalization difficult.

reinforcement-learning Reinforcement Learning (RL) +1

Toward a Human-Level Video Understanding Intelligence

no code implementations8 Oct 2021 Yu-Jung Heo, Minsu Lee, SeongHo Choi, Woo Suk Choi, Minjung Shin, Minjoon Jung, Jeh-Kwang Ryu, Byoung-Tak Zhang

In this paper, we propose the Video Turing Test to provide effective and practical assessments of video understanding intelligence as well as human-likeness evaluation of AI agents.

Video Understanding

Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering

no code implementations11 Aug 2021 Donggeon Lee, SeongHo Choi, Youwon Jang, Byoung-Tak Zhang

In this paper, we challenge the existing multiple-choice video question answering by changing it to open-ended video question answering.

Language Modelling Multiple-choice +2

CogME: A Novel Evaluation Metric for Video Understanding Intelligence

no code implementations21 Jul 2021 Minjung Shin, Jeonghoon Kim, SeongHo Choi, Yu-Jung Heo, Donghyun Kim, Minsu Lee, Byoung-Tak Zhang, Jeh-Kwang Ryu

Then we propose a top-down evaluation system for VideoQA, based on the cognitive process of humans and story elements: Cognitive Modules for Evaluation (CogME).

Question Answering Sentence +2

M2FN: Multi-step Modality Fusion for Advertisement Image Assessment

no code implementations31 Jan 2021 Kyung-Wha Park, Jung-Woo Ha, Junghoon Lee, Sunyoung Kwon, Kyung-Min Kim, Byoung-Tak Zhang

Assessing advertisements, specifically on the basis of user preferences and ad quality, is crucial to the marketing industry.

Marketing

Learning task-agnostic representation via toddler-inspired learning

no code implementations27 Jan 2021 Kwanyoung Park, Junseok Park, Hyunseok Oh, Byoung-Tak Zhang, Youngki Lee

One of the inherent limitations of current AI systems, stemming from the passive learning mechanisms (e. g., supervised learning), is that they perform well on labeled datasets but cannot deduce knowledge on their own.

Image Classification Object Localization

Unbiased learning with State-Conditioned Rewards in Adversarial Imitation Learning

no code implementations1 Jan 2021 Dong-Sig Han, Hyunseo Kim, Hyundo Lee, Je-Hwan Ryu, Byoung-Tak Zhang

The formulation draws a strong connection between adversarial learning and energy-based reinforcement learning; thus, the architecture is capable of recovering a reward function that induces a multi-modal policy.

Continuous Control Imitation Learning +2

Ruminating Word Representations with Random Noise Masking

no code implementations1 Jan 2021 Hwiyeol Jo, Byoung-Tak Zhang

Through the re-training process, some of noises can be compensated and other noises can be utilized to learn better representations.

text-classification Text Classification +1

Deep Quotient Manifold Modeling

no code implementations1 Jan 2021 Jiseob Kim, Seungjae Jung, Hyundo Lee, Byoung-Tak Zhang

One of the difficulties in modeling real-world data is their complex multi-manifold structure due to discrete features.

Spectrally Similar Graph Pooling

no code implementations1 Jan 2021 Kyoung-Woon On, Eun-Sol Kim, Il-Jae Kwon, Sangwoong Yoon, Byoung-Tak Zhang

To further investigate the effectiveness of our proposed method, we evaluate our approach on a real-world problem, image retrieval with visual scene graphs.

Image Retrieval Retrieval

Message Passing Adaptive Resonance Theory for Online Active Semi-supervised Learning

no code implementations2 Dec 2020 Taehyeong Kim, Injune Hwang, Hyundo Lee, Hyunseo Kim, Won-Seok Choi, Joseph J. Lim, Byoung-Tak Zhang

Active learning is widely used to reduce labeling effort and training time by repeatedly querying only the most beneficial samples from unlabeled data.

Active Learning

Human-Like Active Learning: Machines Simulating the Human Learning Process

no code implementations7 Nov 2020 Jaeseo Lim, Hwiyeol Jo, Byoung-Tak Zhang, Jooyong Park

In the end, we showed not only that we can make build better machine training framework through the human experiment result, but also empirically confirm the result of human experiment through imitated machine experiments; human-like active learning have crucial effect on learning performance.

Active Learning Knowledge Distillation

Co-attentional Transformers for Story-Based Video Understanding

no code implementations27 Oct 2020 Björn Bebensee, Byoung-Tak Zhang

Inspired by recent trends in vision and language learning, we explore applications of attention mechanisms for visio-lingual fusion within an application to story-based video understanding.

Question Answering Video Question Answering +1

Pattern Denoising in Molecular Associative Memory using Pairwise Markov Random Field Models

no code implementations28 May 2020 Dharani Punithan, Byoung-Tak Zhang

We propose an in silico molecular associative memory model for pattern learning, storage and denoising using Pairwise Markov Random Field (PMRF) model.

Denoising

DramaQA: Character-Centered Video Story Understanding with Hierarchical QA

1 code implementation7 May 2020 Seong-Ho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang

Despite recent progress on computer vision and natural language processing, developing a machine that can understand video story is still hard to achieve due to the intrinsic difficulty of video story.

Question Answering Video Question Answering +1

Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data

no code implementations17 Jan 2020 Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video.

Graph Learning Video Understanding

Ruminating Word Representations with Random Noised Masker

no code implementations8 Nov 2019 Hwiyeol Jo, Byoung-Tak Zhang

Next, we gradually add random noises to the word representations and repeat the training process from scratch, but initialize with the noised word representations.

text-classification Text Classification +1

Which Ads to Show? Advertisement Image Assessment with Auxiliary Information via Multi-step Modality Fusion

no code implementations6 Oct 2019 Kyung-Wha Park, Junghoon Lee, Sunyoung Kwon, Jung-Woo Ha, Kyung-Min Kim, Byoung-Tak Zhang

Despite crucial influences of image quality, auxiliary information of ad images such as tags and target subjects can also determine image preference.

Manifold Learning and Alignment with Generative Adversarial Networks

no code implementations25 Sep 2019 Jiseob Kim, Seungjae Jung, Hyundo Lee, Byoung-Tak Zhang

We present a generative adversarial network (GAN) that conducts manifold learning and alignment (MLA): A task to learn the multi-manifold structure underlying data and to align those manifolds without any correspondence information.

Disentanglement Generative Adversarial Network

Discriminative Variational Autoencoder for Continual Learning with Generative Replay

no code implementations25 Sep 2019 Woo-Young Kang, Cheol-Ho Han, Byoung-Tak Zhang

Generative replay (GR) is a method to alleviate catastrophic forgetting in continual learning (CL) by generating previous task data and learning them together with the data from new tasks.

Continual Learning Permuted-MNIST +2

Compositional Structure Learning for Sequential Video Data

no code implementations3 Jul 2019 Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods.

Encoder-Powered Generative Adversarial Networks

no code implementations3 Jun 2019 Jiseob Kim, Seungjae Jung, Hyundo Lee, Byoung-Tak Zhang

We present an encoder-powered generative adversarial network (EncGAN) that is able to learn both the multi-manifold structure and the abstract features of data.

Generative Adversarial Network Style Transfer

Simulating Problem Difficulty in Arithmetic Cognition Through Dynamic Connectionist Models

no code implementations9 May 2019 Sungjae Cho, Jaeseo Lim, Chris Hickey, Jung Ae Park, Byoung-Tak Zhang

Problem difficulty was operationalized by the number of carries involved in solving a given problem.

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

2 code implementations IJCNLP 2019 Gi-Cheon Kang, Jaeseo Lim, Byoung-Tak Zhang

Specifically, REFER module learns latent relationships between a given question and a dialog history by employing a self-attention mechanism.

Question Answering Visual Dialog +2

Visualizing Semantic Structures of Sequential Data by Learning Temporal Dependencies

no code implementations20 Jan 2019 Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang

While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies.

Data Interpolations in Deep Generative Models under Non-Simply-Connected Manifold Topology

no code implementations20 Jan 2019 Jiseob Kim, Byoung-Tak Zhang

Exploiting the deep generative model's remarkable ability of learning the data-manifold structure, some recent researches proposed a geometric data interpolation method based on the geodesic curves on the learned data-manifold.

GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation

2 code implementations28 May 2018 Taehyeong Kim, Min-Oh Heo, Seonil Son, Kyoung-Wha Park, Byoung-Tak Zhang

The task of multi-image cued story generation, such as visual storytelling dataset (VIST) challenge, is to compose multiple coherent sentences from a given sequence of images.

Ranked #30 on Visual Storytelling on VIST (METEOR metric)

Sentence Visual Storytelling

Bilinear Attention Networks

8 code implementations NeurIPS 2018 Jin-Hwa Kim, Jaehyun Jun, Byoung-Tak Zhang

In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly.

Visual Question Answering

Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog

1 code implementation NeurIPS 2018 Sang-Woo Lee, Yu-Jung Heo, Byoung-Tak Zhang

Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take.

Goal-Oriented Dialog Visual Dialog

Understanding Local Minima in Neural Networks by Loss Surface Decomposition

no code implementations ICLR 2018 Hanock Kwak, Byoung-Tak Zhang

The parameter domain of the loss surface can be decomposed into regions in which activation values (zero or one for rectified linear units) are consistent.

Visual Explanations from Hadamard Product in Multimodal Deep Networks

no code implementations18 Dec 2017 Jin-Hwa Kim, Byoung-Tak Zhang

Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function of visual question answering tasks, implicitly performs an attentional mechanism for visual inputs.

Question Answering Visual Question Answering

Multi-focus Attention Network for Efficient Deep Reinforcement Learning

no code implementations13 Dec 2017 Jinyoung Choi, Beom-Jin Lee, Byoung-Tak Zhang

In multi-agent cooperative task experiments, our model shows 20% faster learning than existing state-of-the-art model.

reinforcement-learning Reinforcement Learning (RL)

DeepStory: Video Story QA by Deep Embedded Memory Networks

no code implementations4 Jul 2017 Kyung-Min Kim, Min-Oh Heo, Seong-Ho Choi, Byoung-Tak Zhang

This is mainly due to 1) the reconstruction of video stories in a scene-dialogue combined form that utilize the latent embedding and 2) attention.

Question Answering Video Story QA

Overcoming Catastrophic Forgetting by Incremental Moment Matching

1 code implementation NeurIPS 2017 Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, Byoung-Tak Zhang

Catastrophic forgetting is a problem of neural networks that loses the information of the first task after training the second task.

Transfer Learning

Ways of Conditioning Generative Adversarial Networks

no code implementations4 Nov 2016 Hanock Kwak, Byoung-Tak Zhang

The GANs are generative models whose random samples realistically reflect natural images.

Human Body Orientation Estimation using Convolutional Neural Network

no code implementations7 Sep 2016 Jinyoung Choi, Beom-Jin Lee, Byoung-Tak Zhang

However, in most of the service robot applications, the user needs to move himself/herself to allow the robot to see him/her face to face.

Face Detection

Generating Images Part by Part with Composite Generative Adversarial Networks

no code implementations19 Jul 2016 Hanock Kwak, Byoung-Tak Zhang

We propose a model called composite generative adversarial network, that reveals the complex structure of images with multiple generators in which each generator generates some part of the image.

Generative Adversarial Network Image Generation

Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy

no code implementations15 Jun 2015 Sang-Woo Lee, Min-Oh Heo, Jiwon Kim, Jeonghee Kim, Byoung-Tak Zhang

The proposed architecture consists of deep representation learners and fast learnable shallow kernel networks, both of which synergize to track the information of new data.

Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.