Search Results for author: Sangpil Kim

Found 35 papers, 6 papers with code

A Large-scale Annotated Mechanical Components Benchmark for Classification and Retrieval Tasks with Deep Neural Networks

1 code implementation ECCV 2020 Sangpil Kim, Hyung-gun Chi, Xiao Hu, Qi-Xing Huang, Karthik Ramani

We introduce a large-scale annotated mechanical components benchmark for classification and retrieval tasks named MechanicalComponents Benchmark (MCB): a large-scale dataset of 3D objects of mechanical components.

Retrieval

3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation

no code implementations19 Mar 2025 Gyeongrok Oh, Sungjune Kim, Heeju Ko, Hyung-gun Chi, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sungjoon Choi, Sujin Jang, Sangpil Kim

The resolution of voxel queries significantly influences the quality of view transformation in camera-based 3D occupancy prediction.

CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting

no code implementations17 Mar 2025 Sumin In, Youngdong Jang, Utae Jeong, MinHyuk Jang, Hyeongcheol Park, Eunbyung Park, Sangpil Kim

3D Gaussian Splatting (3DGS) enables rapid differentiable rendering for 3D reconstruction and novel view synthesis, leading to its widespread commercial use.

3DGS 3D Reconstruction +3

AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis

no code implementations17 Mar 2025 Hadam Baek, Hannie Shin, Jiyoung Seo, Chanwoo Kim, Saerom Kim, Hyeongbok Kim, Sangpil Kim

Accurately modeling sound propagation with complex real-world environments is essential for Novel View Acoustic Synthesis (NVAS).

3DGS

CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image

no code implementations17 Dec 2024 Wonseok Roh, Hwanhee Jung, Jong Wook Kim, Seunggwan Lee, Innfarn Yoo, Andreas Lugmayr, Seunggeun Chi, Karthik Ramani, Sangpil Kim

Recently, generalizable feed-forward methods based on 3D Gaussian Splatting have gained significant attention for their potential to reconstruct 3D scenes using finite resources.

3D Scene Reconstruction Novel View Synthesis

LVMark: Robust Watermark for Latent Video Diffusion Models

no code implementations12 Dec 2024 MinHyuk Jang, Youngdong Jang, JaeHyeok Lee, Feng Yang, Gyeongrok Oh, Jongheon Jeong, Sangpil Kim

We optimize both the watermark decoder and the latent decoder of diffusion model, effectively balancing the trade-off between visual quality and bit accuracy.

Decoder

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

no code implementations26 Nov 2024 Gyeongjin Kang, Jisang Yoo, Jihyeon Park, Seungtae Nam, Hyeonsoo Im, Sangheon Shin, Sangpil Kim, Eunbyung Park

Our model addresses these challenges by effectively integrating explicit 3D representations with self-supervised depth and pose estimation techniques, resulting in reciprocal improvements in both pose accuracy and 3D reconstruction quality.

3D Reconstruction Pose Estimation

Posture-Informed Muscular Force Learning for Robust Hand Pressure Estimation

no code implementations31 Oct 2024 Kyungjin Seo, Junghoon Seo, Hanseok Jeong, Sangpil Kim, Sang Ho Yoon

We present PiMForce, a novel framework that enhances hand pressure estimation by leveraging 3D hand posture information to augment forearm surface electromyography (sEMG) signals.

Unified Domain Generalization and Adaptation for Multi-View 3D Object Detection

no code implementations29 Oct 2024 Gyusam Chang, Jiwon Lee, Donghyun Kim, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sujin Jang, Sangpil Kim

However, typical supervised learning approaches face challenges in achieving satisfactory adaptation toward unseen and unlabeled target datasets (\ie, direct transfer) due to the inevitable geometric misalignment between the source and target domains.

3D Object Detection Domain Generalization +1

3D-GSW: 3D Gaussian Splatting for Robust Watermarking

no code implementations20 Sep 2024 Youngdong Jang, Hyunje Park, Feng Yang, Heeju Ko, Euijin Choo, Sangpil Kim

In this paper, we introduce a robust watermarking method for 3D-GS that secures copyright of both the model and its rendered images.

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

no code implementations17 Jul 2024 Seokha Moon, Hyun Woo, Hongbeen Park, Haeji Jung, Reza Mahjourian, Hyung-gun Chi, Hyerin Lim, Sangpil Kim, Jinkyu Kim

In this work, we propose a novel method that also incorporates visual input from surround-view cameras, allowing the model to utilize visual cues such as human gazes and gestures, road conditions, vehicle turn signals, etc, which are typically hidden from the model in prior methods.

Autonomous Vehicles Language Modeling +4

WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights

no code implementations CVPR 2024 Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim

The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth.

NeRF

CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection

no code implementations6 Mar 2024 Gyusam Chang, Wonseok Roh, Sujin Jang, Dongwook Lee, Daehyun Ji, Gyeongrok Oh, Jinsun Park, Jinkyu Kim, Sangpil Kim

Recent LiDAR-based 3D Object Detection (3DOD) methods show promising results, but they often do not generalize well to target domains outside the source (or training) data distribution.

3D Object Detection object-detection +1

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

no code implementations11 Jan 2024 Seung Hyun Lee, Yinxiao Li, Junjie Ke, Innfarn Yoo, Han Zhang, Jiahui Yu, Qifei Wang, Fei Deng, Glenn Entis, Junfeng He, Gang Li, Sangpil Kim, Irfan Essa, Feng Yang

We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference.

Reinforcement Learning (RL) Text-to-Image Generation

Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior

no code implementations CVPR 2024 Wonseok Roh, Hwanhee Jung, Giljoo Nam, Jinseop Yeom, Hyunje Park, Sang Ho Yoon, Sangpil Kim

While recent 3D instance segmentation approaches show promising results based on transformer architectures they often fail to correctly identify instances with similar appearances.

3D Instance Segmentation Language Modeling +2

MEVG: Multi-event Video Generation with Text-to-Video Models

no code implementations7 Dec 2023 Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Sangpil Kim

We introduce a novel diffusion-based video generation method, generating a video showing multiple events given multiple individual sentences from the user.

Video Generation

Clustering-based Image-Text Graph Matching for Domain Generalization

1 code implementation4 Oct 2023 Nokyung Park, Daewon Chae, Jeongyong Shim, Sangpil Kim, Eun-Sol Kim, Jinkyu Kim

However, they use pivot embedding in a global manner (i. e., aligning an image embedding with sentence-level text embedding), which does not fully utilize the semantic cues of given text description.

Clustering Domain Generalization +3

The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

no code implementations ICCV 2023 Yujin Jeong, Wonjeong Ryoo, SeungHyun Lee, Dabin Seo, Wonmin Byeon, Sangpil Kim, Jinkyu Kim

Hence, we propose The Power of Sound (TPoS) model to incorporate audio input that includes both changeable temporal semantics and magnitude.

Video Generation

Event Fusion Photometric Stereo Network

no code implementations1 Mar 2023 Wonjeong Ryoo, Giljoo Nam, Jae-Sang Hyun, Sangpil Kim

We present a novel method to estimate the surface normal of an object in an ambient light environment using RGB and event cameras.

Surface Normal Estimation

Dual Policy Learning for Aggregation Optimization in Graph Neural Network-based Recommender Systems

1 code implementation21 Feb 2023 Heesoo Jung, Sangpil Kim, Hogun Park

This framework adaptively determines high-order connectivity to aggregate users and items using dual policy learning.

Graph Neural Network Knowledge Graphs

FPANet: Frequency-based Video Demoireing using Frame-level Post Alignment

no code implementations18 Jan 2023 Gyeongrok Oh, Sungjune Kim, Heon Gu, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

Moire patterns, created by the interference between overlapping grid patterns in the pixel space, degrade the visual quality of images and videos.

SSIM

LISA: Localized Image Stylization with Audio via Implicit Neural Representation

no code implementations21 Nov 2022 Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization.

Image Stylization Object +1

Robust Sound-Guided Image Manipulation

no code implementations30 Aug 2022 Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations.

Image Manipulation

ORA3D: Overlap Region Aware Multi-view 3D Object Detection

no code implementations2 Jul 2022 Wonseok Roh, Gyusam Chang, Seokha Moon, Giljoo Nam, Chanyoung Kim, Younghyun Kim, Jinkyu Kim, Sangpil Kim

Current multi-view 3D object detection methods often fail to detect objects in the overlap region properly, and the networks' understanding of the scene is often limited to that of a monocular detection network.

3D Object Detection Disparity Estimation +4

Sound-Guided Semantic Video Generation

no code implementations20 Apr 2022 Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Hyunjun Cho, Jihyun Bae, Jinkyu Kim, Sangpil Kim

The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation.

Video Editing Video Generation

Sound-Guided Semantic Image Manipulation

1 code implementation CVPR 2022 Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, Jinkyu Kim, Sangpil Kim

Our audio encoder is trained to produce a latent representation from an audio input, which is forced to be aligned with image and text representations in the multi-modal embedding space.

Audio Classification Image Classification +2

Egocentric View Hand Action Recognition by Leveraging Hand Surface and Hand Grasp Type

no code implementations8 Sep 2021 Sangpil Kim, Jihyun Bae, Hyunggun Chi, Sunghee Hong, Byoung Soo Koh, Karthik Ramani

We introduce a multi-stage framework that uses mean curvature on a hand surface and focuses on learning interaction between hand and object by analyzing hand grasp type for hand action recognition in egocentric videos.

Action Recognition Object +1

Latent Transformations for Object View Points Synthesis

no code implementations12 Jul 2018 Sangpil Kim, Nick Winovich, Guang Lin, Karthik Ramani

We propose a fully-convolutional conditional generative model, the latent transformation neural network (LTNN), capable of view synthesis using a light-weight neural network suited for real-time applications.

Decoder Object

Learning Hand Articulations by Hallucinating Heat Distribution

no code implementations ICCV 2017 Chiho Choi, Sangpil Kim, Karthik Ramani

As an additional modality to depth data, we present a function of geometric properties on the surface of the hand described by heat diffusion.

Descriptive Hallucination +1

Cannot find the paper you are looking for? You can Submit a new open access paper.