Search Results for author: Dong Huk Park

Found 14 papers, 5 papers with code

Robust Change Captioning

1 code implementation ICCV 2019 Dong Huk Park, Trevor Darrell, Anna Rohrbach

We present a novel Dual Dynamic Attention Model (DUDA) to perform robust Change Captioning.

Natural Language Visual Grounding

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

no code implementations14 Dec 2016 Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions.

Decision Making Question Answering +2

Learning a Unified Embedding for Visual Search at Pinterest

no code implementations5 Aug 2019 Andrew Zhai, Hao-Yu Wu, Eric Tzeng, Dong Huk Park, Charles Rosenberg

The solution we present not only allows us to train for multiple application objectives in a single deep neural network architecture, but takes advantage of correlated information in the combination of all training data from each application to generate a unified embedding that outperforms all specialized embeddings previously deployed for each product.

Metric Learning Navigate +2

Novelty Detection with Rotated Contrastive Predictive Coding

no code implementations1 Jan 2021 Dong Huk Park, Trevor Darrell

To this end, reconstruction-based learning is often used in which the normality of an observation is expressed in how well it can be reconstructed.

Contrastive Learning Novelty Detection

Toward Transformer-Based Object Detection

no code implementations17 Dec 2020 Josh Beal, Eric Kim, Eric Tzeng, Dong Huk Park, Andrew Zhai, Dmitry Kislyuk

The Vision Transformer was the first major attempt to apply a pure transformer model directly to images as input, demonstrating that as compared to convolutional networks, transformer-based architectures can achieve competitive results on benchmark classification tasks.

Object object-detection +1

Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations

no code implementations12 Aug 2021 Josh Beal, Hao-Yu Wu, Dong Huk Park, Andrew Zhai, Dmitry Kislyuk

Large-scale pretraining of visual representations has led to state-of-the-art performance on a range of benchmark computer vision tasks, yet the benefits of these techniques at extreme scale in complex production systems has been relatively unexplored.

Ranked #26 on Image Classification on ObjectNet (using extra training data)

Image Classification Multi-Task Learning +2

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

no code implementations10 Dec 2021 Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell

We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.

Continuous Control Denoising +1

Shape-Guided Diffusion with Inside-Outside Attention

no code implementations1 Dec 2022 Dong Huk Park, Grace Luo, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, Trevor Darrell

We introduce precise object silhouette as a new form of user control in text-to-image diffusion models, which we dub Shape-Guided Diffusion.

Object

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

no code implementations NeurIPS 2023 Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holynski, Trevor Darrell

We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks.

Semantic correspondence

Cannot find the paper you are looking for? You can Submit a new open access paper.