Search Results for author: Paul Hongsuck Seo

Found 22 papers, 8 papers with code

Learning Correlation Structures for Vision Transformers

no code implementations • 5 Apr 2024 • Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho

We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention.

Ranked #4 on Action Recognition on Diving-48

Action Classification Action Recognition +2

Paper
Add Code

Zero-shot Referring Image Segmentation with Global-Local Context Features

1 code implementation • CVPR 2023 • Seonghoon Yu, Paul Hongsuck Seo, Jeany Son

To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP.

Image Segmentation Referring Expression +4

101

Paper
Code

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

no code implementations • CVPR 2023 • Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

(ii) We also introduce a simple curriculum scheme during training which we show is crucial to enable the model to jointly process audio and visual information effectively; and finally (iii) we show that our model achieves state of the art zero-shot results on three different AV-ASR benchmarks (How2, VisSpeech and Ego4D), while also crucially preserving decent performance on traditional audio-only speech recognition benchmarks (LibriSpeech).

Automatic Speech Recognition Domain Adaptation +2

Paper
Add Code

IFSeg: Image-free Semantic Segmentation via Vision-Language Model

1 code implementation • CVPR 2023 • Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin

In this paper, we introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories, but without any task-specific images and annotations.

Image Segmentation Language Modelling +3

Paper
Code

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

3 code implementations • 21 Mar 2023 • Seokju Cho, Heeseong Shin, Sunghwan Hong, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim

Open-vocabulary semantic segmentation presents the challenge of labeling each pixel within an image based on a wide range of text descriptions.

Ranked #1 on Open Vocabulary Semantic Segmentation on ADE20K-150

Image Segmentation Open Vocabulary Semantic Segmentation +3

212

Paper
Code

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

3 code implementations • CVPR 2023 • Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pretrained on narrated videos which are readily-available at scale.

Ranked #1 on Dense Video Captioning on ActivityNet Captions (using extra training data)

Dense Video Captioning Language Modelling +1

3,070

Paper
Code

AVATAR submission to the Ego4D AV Transcription Challenge

no code implementations • 18 Nov 2022 • Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

In this report, we describe our submission to the Ego4D AudioVisual (AV) Speech Transcription Challenge 2022.

Decoder

Paper
Add Code

AVATAR: Unconstrained Audiovisual Speech Recognition

1 code implementation • 15 Jun 2022 • Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid

Audio-visual automatic speech recognition (AV-ASR) is an extension of ASR that incorporates visual cues, often from the movements of a speaker's mouth.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

127

Paper
Code

Learning Audio-Video Modalities from Image Captions

no code implementations • 1 Apr 2022 • Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid

To close this gap we propose a new video mining pipeline which involves transferring captions from image captioning datasets to video clips with no additional manual effort.

Ranked #6 on Zero-shot Text to Audio Retrieval on AudioCaps

Image Captioning Retrieval +4

Paper
Add Code

End-to-end Generative Pretraining for Multimodal Video Captioning

no code implementations • CVPR 2022 • Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid

Recent video and language pretraining frameworks lack the ability to generate sentences.

Ranked #14 on Video Captioning on MSR-VTT (using extra training data)

Action Classification Decoder +5

Paper
Add Code

Look Before you Speak: Visually Contextualized Utterances

no code implementations • CVPR 2021 • Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid

Leveraging recent advances in multimodal learning, our model consists of a novel co-attentional multimodal video transformer, and when trained on both textual and visual context, outperforms baselines that use textual inputs alone.

Paper
Add Code

Combinatorial Inference against Label Noise

1 code implementation • NeurIPS 2019 • Paul Hongsuck Seo, Geeho Kim, Bohyung Han

Label noise is one of the critical sources that degrade generalization performance of deep neural networks significantly.

Clustering

Paper
Code

Reinforcing an Image Caption Generator Using Off-Line Human Feedback

no code implementations • 21 Nov 2019 • Paul Hongsuck Seo, Piyush Sharma, Tomer Levinboim, Bohyung Han, Radu Soricut

Human ratings are currently the most accurate way to assess the quality of an image captioning model, yet most often the only used outcome of an expensive human rating evaluation is a few overall statistics over the evaluation dataset.

Image Captioning

Paper
Add Code

Regularizing Neural Networks via Stochastic Branch Layers

no code implementations • 3 Oct 2019 • Wonpyo Park, Paul Hongsuck Seo, Bohyung Han, Minsu Cho

We introduce a novel stochastic regularization technique for deep neural networks, which decomposes a layer into multiple branches with different parameters and merges stochastically sampled combinations of the outputs from the branches during training.

Paper
Add Code

Learning for Single-Shot Confidence Calibration in Deep Neural Networks through Stochastic Inferences

no code implementations • CVPR 2019 • Seonguk Seo, Paul Hongsuck Seo, Bohyung Han

The proposed loss function enables us to learn deep neural networks that predict confidence calibrated scores using a single inference.

Paper
Add Code

CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps

no code implementations • ECCV 2018 • Paul Hongsuck Seo, Tobias Weyand, Jack Sim, Bohyung Han

Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information.

Ranked #1 on Photo geolocation estimation on Im2GPS (Reference images metric)

Photo geolocation estimation

Paper
Add Code

Attentive Semantic Alignment with Offset-Aware Correlation Kernels

no code implementations • ECCV 2018 • Paul Hongsuck Seo, Jongmin Lee, Deunsol Jung, Bohyung Han, Minsu Cho

Semantic correspondence is the problem of establishing correspondences across images depicting different instances of the same object or scene class.

Semantic correspondence Translation

Paper
Add Code

Visual Reference Resolution using Attention Memory for Visual Dialog

no code implementations • NeurIPS 2017 • Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal

From this memory, the model retrieves the previous attention, taking into account recency, which is most relevant for the current question, in order to resolve potentially ambiguous references.

Ranked #13 on Visual Dialog on VisDial v0.9 val (R@1 metric)

Parameter Prediction Question Answering +3

Paper
Add Code

MarioQA: Answering Questions by Watching Gameplay Videos

no code implementations • ICCV 2017 • Jonghwan Mun, Paul Hongsuck Seo, Ilchae Jung, Bohyung Han

To address this objective, we automatically generate a customized synthetic VideoQA dataset using {\em Super Mario Bros.} gameplay videos so that it contains events with different levels of reasoning complexity.

Question Answering Video Question Answering

Paper
Add Code

Progressive Attention Networks for Visual Attribute Prediction

1 code implementation • 8 Jun 2016 • Paul Hongsuck Seo, Zhe Lin, Scott Cohen, Xiaohui Shen, Bohyung Han

We propose a novel attention model that can accurately attends to target objects of various scales and shapes in images.

Attribute Hard Attention

Paper
Code

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

1 code implementation • CVPR 2016 • Hyeonwoo Noh, Paul Hongsuck Seo, Bohyung Han

We tackle image question answering (ImageQA) problem by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are determined adaptively based on questions.

Ranked #7 on Image Retrieval with Multi-Modal Query on Fashion200k

Image Retrieval with Multi-Modal Query Parameter Prediction +2

Paper
Code

Conversational Knowledge Teaching Agent that uses a Knowledge Base

no code implementations • WS 2015 • Kyusong Lee, Paul Hongsuck Seo, Junhwi Choi, Sangjun Koo, Gary Geunbae Lee

Grammatical Error Correction Knowledge Base Question Answering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.