Search Results for author: Dong Huk Park

Found 14 papers, 5 papers with code

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

10 code implementations • EMNLP 2016 • Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach

Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations.

Ranked #1 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 1.0 multiple choice

Visual Grounding Visual Question Answering

699

Paper
Code

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

1 code implementation • CVPR 2018 • Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths.

Activity Recognition Explainable Models +2

Paper
Code

Robust Change Captioning

1 code implementation • ICCV 2019 • Dong Huk Park, Trevor Darrell, Anna Rohrbach

We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning.

Natural Language Visual Grounding

Paper
Code

Discovering Autoregressive Orderings with Variational Inference

1 code implementation • ICLR 2021 • Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao

One strategy to recover this information is to decode both the content and location of tokens.

Code Generation Image Captioning +2

Paper
Code

Discovering Non-monotonic Autoregressive Orderings with Variational Inference

1 code implementation • 27 Oct 2021 • Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao

Permutations then serve as target generation orders for training an insertion-based Transformer language model.

Image Captioning Language Modelling +3

Paper
Code

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

no code implementations • 17 Nov 2017 • Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

We also introduce a multimodal methodology for generating visual and textual explanations simultaneously.

Question Answering Visual Question Answering (VQA)

Paper
Add Code

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

no code implementations • 14 Dec 2016 • Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions.

Decision Making Question Answering +2

Paper
Add Code

Learning a Unified Embedding for Visual Search at Pinterest

no code implementations • 5 Aug 2019 • Andrew Zhai, Hao-Yu Wu, Eric Tzeng, Dong Huk Park, Charles Rosenberg

The solution we present not only allows us to train for multiple application objectives in a single deep neural network architecture, but takes advantage of correlated information in the combination of all training data from each application to generate a unified embedding that outperforms all specialized embeddings previously deployed for each product.

Metric Learning Navigate +2

Paper
Add Code

Novelty Detection with Rotated Contrastive Predictive Coding

no code implementations • 1 Jan 2021 • Dong Huk Park, Trevor Darrell

To this end, reconstruction-based learning is often used in which the normality of an observation is expressed in how well it can be reconstructed.

Contrastive Learning Novelty Detection

Paper
Add Code

Toward Transformer-Based Object Detection

no code implementations • 17 Dec 2020 • Josh Beal, Eric Kim, Eric Tzeng, Dong Huk Park, Andrew Zhai, Dmitry Kislyuk

The Vision Transformer was the first major attempt to apply a pure transformer model directly to images as input, demonstrating that as compared to convolutional networks, transformer-based architectures can achieve competitive results on benchmark classification tasks.

Object object-detection +1

Paper
Add Code

Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations

no code implementations • 12 Aug 2021 • Josh Beal, Hao-Yu Wu, Dong Huk Park, Andrew Zhai, Dmitry Kislyuk

Large-scale pretraining of visual representations has led to state-of-the-art performance on a range of benchmark computer vision tasks, yet the benefits of these techniques at extreme scale in complex production systems has been relatively unexplored.

Ranked #26 on Image Classification on ObjectNet (using extra training data)

Image Classification Multi-Task Learning +2

Paper
Add Code

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

no code implementations • 10 Dec 2021 • Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell

We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.

Continuous Control Denoising +1

Paper
Add Code

Shape-Guided Diffusion with Inside-Outside Attention

no code implementations • 1 Dec 2022 • Dong Huk Park, Grace Luo, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, Trevor Darrell

We introduce precise object silhouette as a new form of user control in text-to-image diffusion models, which we dub Shape-Guided Diffusion.

Object

Paper
Add Code

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

no code implementations • NeurIPS 2023 • Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holynski, Trevor Darrell

We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks.

Semantic correspondence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.