Search Results for author: Sangpil Kim

Found 19 papers, 5 papers with code

A Large-scale Annotated Mechanical Components Benchmark for Classification and Retrieval Tasks with Deep Neural Networks

1 code implementation • ECCV 2020 • Sangpil Kim, Hyung-gun Chi, Xiao Hu, Qi-Xing Huang, Karthik Ramani

We introduce a large-scale annotated mechanical components benchmark for classification and retrieval tasks named MechanicalComponents Benchmark (MCB): a large-scale dataset of 3D objects of mechanical components.

Retrieval

Paper
Code

CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection

no code implementations • 6 Mar 2024 • Gyusam Chang, Wonseok Roh, Sujin Jang, Dongwook Lee, Daehyun Ji, Gyeongrok Oh, Jinsun Park, Jinkyu Kim, Sangpil Kim

Recent LiDAR-based 3D Object Detection (3DOD) methods show promising results, but they often do not generalize well to target domains outside the source (or training) data distribution.

3D Object Detection object-detection +1

Paper
Add Code

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

no code implementations • 11 Jan 2024 • Seung Hyun Lee, Yinxiao Li, Junjie Ke, Innfarn Yoo, Han Zhang, Jiahui Yu, Qifei Wang, Fei Deng, Glenn Entis, Junfeng He, Gang Li, Sangpil Kim, Irfan Essa, Feng Yang

Additionally, Parrot employs a joint optimization approach for the T2I model and the prompt expansion network, facilitating the generation of quality-aware text prompts, thus further enhancing the final image quality.

Reinforcement Learning (RL) Text-to-Image Generation

Paper
Add Code

MTVG : Multi-text Video Generation with Text-to-Video Models

no code implementations • 7 Dec 2023 • Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Hyeokmin Kwon, Sangpil Kim

Concerning the characteristics of video, multi-text conditioning incorporating sequential events is necessary for next-step video generation.

Video Generation

Paper
Add Code

Clustering-based Image-Text Graph Matching for Domain Generalization

no code implementations • 4 Oct 2023 • Nokyung Park, Daewon Chae, Jeongyong Shim, Sangpil Kim, Eun-Sol Kim, Jinkyu Kim

However, they use pivot embedding in global manner (i. e., aligning an image embedding with sentence-level text embedding), not fully utilizing the semantic cues of given text description.

Clustering Domain Generalization +2

Paper
Add Code

The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

no code implementations • ICCV 2023 • Yujin Jeong, Wonjeong Ryoo, SeungHyun Lee, Dabin Seo, Wonmin Byeon, Sangpil Kim, Jinkyu Kim

Hence, we propose The Power of Sound (TPoS) model to incorporate audio input that includes both changeable temporal semantics and magnitude.

Video Generation

Paper
Add Code

Soundini: Sound-Guided Diffusion for Natural Video Editing

1 code implementation • 13 Apr 2023 • Seung Hyun Lee, Sieun Kim, Innfarn Yoo, Feng Yang, Donghyeon Cho, Youngseo Kim, Huiwen Chang, Jinkyu Kim, Sangpil Kim

We propose a method for adding sound-guided visual effects to specific regions of videos with a zero-shot setting.

Denoising Optical Flow Estimation +1

Paper
Code

Event Fusion Photometric Stereo Network

no code implementations • 1 Mar 2023 • Wonjeong Ryoo, Giljoo Nam, Jae-Sang Hyun, Sangpil Kim

We present a novel method to estimate the surface normal of an object in an ambient light environment using RGB and event cameras.

Surface Normal Estimation

Paper
Add Code

Dual Policy Learning for Aggregation Optimization in Graph Neural Network-based Recommender Systems

1 code implementation • 21 Feb 2023 • Heesoo Jung, Sangpil Kim, Hogun Park

This framework adaptively determines high-order connectivity to aggregate users and items using dual policy learning.

Knowledge Graphs Recommendation Systems

Paper
Code

FPANet: Frequency-based Video Demoireing using Frame-level Post Alignment

no code implementations • 18 Jan 2023 • Gyeongrok Oh, Heon Gu, Jinkyu Kim, Sangpil Kim

Interference between overlapping gird patterns creates moire patterns, degrading the visual quality of an image that captures a screen of a digital display device by an ordinary digital camera.

SSIM

Paper
Add Code

LISA: Localized Image Stylization with Audio via Implicit Neural Representation

no code implementations • 21 Nov 2022 • Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization.

Image Stylization Object +1

Paper
Add Code

Robust Sound-Guided Image Manipulation

no code implementations • 30 Aug 2022 • Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations.

Image Manipulation

Paper
Add Code

ORA3D: Overlap Region Aware Multi-view 3D Object Detection

no code implementations • 2 Jul 2022 • Wonseok Roh, Gyusam Chang, Seokha Moon, Giljoo Nam, Chanyoung Kim, Younghyun Kim, Jinkyu Kim, Sangpil Kim

Current multi-view 3D object detection methods often fail to detect objects in the overlap region properly, and the networks' understanding of the scene is often limited to that of a monocular detection network.

Ranked #6 on Robust Camera Only 3D Object Detection on nuScenes-C

3D Object Detection Disparity Estimation +4

Paper
Add Code

Sound-Guided Semantic Video Generation

no code implementations • 20 Apr 2022 • Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Hyunjun Cho, Jihyun Bae, Jinkyu Kim, Sangpil Kim

The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation.

Video Editing Video Generation

Paper
Add Code

Sound-Guided Semantic Image Manipulation

1 code implementation • CVPR 2022 • Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, Jinkyu Kim, Sangpil Kim

Our audio encoder is trained to produce a latent representation from an audio input, which is forced to be aligned with image and text representations in the multi-modal embedding space.

Audio Classification Image Classification +2

Paper
Code

Egocentric View Hand Action Recognition by Leveraging Hand Surface and Hand Grasp Type

no code implementations • 8 Sep 2021 • Sangpil Kim, Jihyun Bae, Hyunggun Chi, Sunghee Hong, Byoung Soo Koh, Karthik Ramani

We introduce a multi-stage framework that uses mean curvature on a hand surface and focuses on learning interaction between hand and object by analyzing hand grasp type for hand action recognition in egocentric videos.

Action Recognition Object +1

Paper
Add Code

Latent Transformations for Object View Points Synthesis

no code implementations • 12 Jul 2018 • Sangpil Kim, Nick Winovich, Guang Lin, Karthik Ramani

We propose a fully-convolutional conditional generative model, the latent transformation neural network (LTNN), capable of view synthesis using a light-weight neural network suited for real-time applications.

Decoder Object