Search Results for author: Byungseok Roh

Found 16 papers, 11 papers with code

CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling

2 code implementations21 Jan 2024 Jawook Gu, Kihyun You, Han-Cheol Cho, Jiho Kim, Eun Kyoung Hong, Byungseok Roh

Moreover, models using expert-annotated data are limited by data scarcity and pre-defined classes, impacting their performance, flexibility and scalability.

Benchmarking

Honeybee: Locality-enhanced Projector for Multimodal LLM

1 code implementation CVPR 2024 Junbum Cha, Wooyoung Kang, Jonghwan Mun, Byungseok Roh

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities.

Ranked #2 on Science Question Answering on ScienceQA (using extra training data)

Science Question Answering

Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection

no code implementations4 Dec 2023 Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo

Specifically, the proposed method aims to learn arbitrary image-to-text mapping for pseudo-labeling of arbitrary concepts, named Pseudo-Labeling for Arbitrary Concepts (PLAC).

Image to text object-detection +3

Large Language Models are Temporal and Causal Reasoners for Video Question Answering

1 code implementation24 Oct 2023 Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim

We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA).

Natural Language Understanding Question Answering +3

CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

1 code implementation20 Oct 2023 Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh

In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt and utilizing multiple images and multiple sections in a radiologic report.

Retrieval

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

1 code implementation CVPR 2023 Dohwan Ko, Joonmyung Choi, Hyeong Kyu Choi, Kyoung-Woon On, Byungseok Roh, Hyunwoo J. Kim

Therefore, we propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning.

Auxiliary Learning Multimodal Sentiment Analysis +10

Open-Vocabulary Object Detection using Pseudo Caption Labels

no code implementations23 Mar 2023 Han-Cheol Cho, Won Young Jhoo, Wooyoung Kang, Byungseok Roh

Recent open-vocabulary detection methods aim to detect novel objects by distilling knowledge from vision-language models (VLMs) trained on a vast amount of image-text pairs.

Image Captioning Knowledge Distillation +3

Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

1 code implementation ICCV 2023 Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh

Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model.

Image Captioning Image Retrieval +1

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

1 code implementation CVPR 2023 Junbum Cha, Jonghwan Mun, Byungseok Roh

Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task.

Contrastive Learning Open Vocabulary Semantic Segmentation +4

Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity

1 code implementation ICLR 2022 Byungseok Roh, Jaewoong Shin, Wuhyun Shin, Saehoon Kim

Deformable DETR uses the multiscale feature to ameliorate performance, however, the number of encoder tokens increases by 20x compared to DETR, and the computation cost of the encoder attention remains a bottleneck.

Computational Efficiency Decoder +2

Spatially Consistent Representation Learning

2 code implementations CVPR 2021 Byungseok Roh, Wuhyun Shin, Ildoo Kim, Sungwoong Kim

While these contrastive methods mainly focus on generating invariant global representations at the image-level under semantic-preserving transformations, they are prone to overlook spatial consistency of local representations and therefore have a limitation in pretraining for localization tasks such as object detection and instance segmentation.

Contrastive Learning Image Classification +6

BABO: Background Activation Black-Out for Efficient Object Detection

no code implementations5 Feb 2020 Byungseok Roh, Han-Cheol Cho, Myung-Ho Ju, Soon Hyung Pyo

Recent advances in deep learning have enabled complex real-world use cases comprised of multiple vision tasks and detection tasks are being shifted to the edge side as a pre-processing step of the entire workload.

Object object-detection +1

PVANet: Lightweight Deep Neural Networks for Real-time Object Detection

9 code implementations23 Nov 2016 Sanghoon Hong, Byungseok Roh, Kye-Hyeon Kim, Yeongjae Cheon, Minje Park

In object detection, reducing computational cost is as important as improving accuracy for most practical usages.

Object object-detection +1

PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

2 code implementations29 Aug 2016 Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park

This paper presents how we can achieve the state-of-the-art accuracy in multi-category object detection task while minimizing the computational cost by adapting and combining recent technical innovations.

General Classification object-detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.