Search Results for author: Byungseok Roh

Found 16 papers, 11 papers with code

Efficient Multilingual Multi-modal Pre-training through Triple Contrastive Loss

no code implementations • COLING 2022 • Youhan Lee, Kyungtae Lim, Woonhyuk Baek, Byungseok Roh, Saehoon Kim

In this multilingual approach, a typical setup is to use pairs of (image and English-text) and translation pairs.

Paper
Add Code

CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling

1 code implementation • 21 Jan 2024 • Jawook Gu, Han-Cheol Cho, Jiho Kim, Kihyun You, Eun Kyoung Hong, Byungseok Roh

Moreover, models using expert-annotated data are limited by data scarcity and pre-defined classes, impacting their performance, flexibility and scalability.

Benchmarking

Paper
Code

Honeybee: Locality-enhanced Projector for Multimodal LLM

1 code implementation • 11 Dec 2023 • Junbum Cha, Wooyoung Kang, Jonghwan Mun, Byungseok Roh

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities.

Ranked #1 on Science Question Answering on ScienceQA (using extra training data)

Science Question Answering

351

Paper
Code

Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection

no code implementations • 4 Dec 2023 • Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo

Specifically, the proposed method aims to learn arbitrary image-to-text mapping for pseudo-labeling of arbitrary concepts, named Pseudo-Labeling for Arbitrary Concepts (PLAC).

object-detection Open Vocabulary Object Detection +2

Paper
Add Code

Large Language Models are Temporal and Causal Reasoners for Video Question Answering

1 code implementation • 24 Oct 2023 • Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim

We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA).

Ranked #1 on Video Question Answering on TVQA

Natural Language Understanding Question Answering +2

Paper
Code

CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

1 code implementation • 20 Oct 2023 • Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh

In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt and utilizing multiple images and multiple sections in a radiologic report.

Retrieval

Paper
Code

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

no code implementations • 5 Sep 2023 • TaeHoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh, Jonghwan Mun, Solgil Oh, Kenan Emir Ak, Gwang-Gook Lee, Yan Xu, Mingwei Shen, Kyomin Hwang, Wonsik Shin, Kamin Lee, Wonhark Park, Dongkwan Lee, Nojun Kwak, Yujin Wang, Yimu Wang, Tiancheng Gu, Xingchang Lv, Mingmao Sun

In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge.

Fairness Image Captioning

Paper
Add Code

Open-Vocabulary Object Detection using Pseudo Caption Labels

no code implementations • 23 Mar 2023 • Han-Cheol Cho, Won Young Jhoo, Wooyoung Kang, Byungseok Roh

Recent open-vocabulary detection methods aim to detect novel objects by distilling knowledge from vision-language models (VLMs) trained on a vast amount of image-text pairs.

Image Captioning Knowledge Distillation +3

Paper
Add Code

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

1 code implementation • CVPR 2023 • Dohwan Ko, Joonmyung Choi, Hyeong Kyu Choi, Kyoung-Woon On, Byungseok Roh, Hyunwoo J. Kim

Therefore, we propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning.

Ranked #2 on Video Captioning on YouCook2

Auxiliary Learning Multimodal Sentiment Analysis +10

Paper
Code

Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

1 code implementation • ICCV 2023 • Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh

Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model.

Image Captioning Image Retrieval +1

Paper
Code

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

1 code implementation • CVPR 2023 • Junbum Cha, Jonghwan Mun, Byungseok Roh

Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task.

Ranked #2 on Semantic Segmentation on CC3M-TagMask

Contrastive Learning Open Vocabulary Semantic Segmentation +4

Paper
Code

Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

1 code implementation • ICLR 2022 • Byungseok Roh, Jaewoong Shin, Wuhyun Shin, Saehoon Kim

Deformable DETR uses the multiscale feature to ameliorate performance, however, the number of encoder tokens increases by 20x compared to DETR, and the computation cost of the encoder attention remains a bottleneck.

Computational Efficiency object-detection +1

151

Paper
Code

Spatially Consistent Representation Learning

2 code implementations • CVPR 2021 • Byungseok Roh, Wuhyun Shin, Ildoo Kim, Sungwoong Kim

While these contrastive methods mainly focus on generating invariant global representations at the image-level under semantic-preserving transformations, they are prone to overlook spatial consistency of local representations and therefore have a limitation in pretraining for localization tasks such as object detection and instance segmentation.

Contrastive Learning Image Classification +6

151

Paper
Code

BABO: Background Activation Black-Out for Efficient Object Detection

no code implementations • 5 Feb 2020 • Byungseok Roh, Han-Cheol Cho, Myung-Ho Ju, Soon Hyung Pyo

Recent advances in deep learning have enabled complex real-world use cases comprised of multiple vision tasks and detection tasks are being shifted to the edge side as a pre-processing step of the entire workload.

Object object-detection +1

Paper
Add Code

PVANet: Lightweight Deep Neural Networks for Real-time Object Detection

9 code implementations • 23 Nov 2016 • Sanghoon Hong, Byungseok Roh, Kye-Hyeon Kim, Yeongjae Cheon, Minje Park

In object detection, reducing computational cost is as important as improving accuracy for most practical usages.

Object object-detection +1

650

Paper
Code

PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

2 code implementations • 29 Aug 2016 • Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park

This paper presents how we can achieve the state-of-the-art accuracy in multi-category object detection task while minimizing the computational cost by adapting and combining recent technical innovations.

General Classification object-detection +3

650

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.