no code implementations • 8 Apr 2025 • Tri Ton, Ji Woo Hong, Chang D. Yoo
This paper introduces Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning (TARO), a novel framework for high-fidelity and temporally coherent video-to-audio synthesis.
no code implementations • CVPR 2025 • Ji Woo Hong, Tri Ton, Trung X. Pham, Gwanhyeong Koo, Sunjae Yoon, Chang D. Yoo
This paper introduces ITA-MDT, the Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On (IVTON), designed to overcome the limitations of previous approaches by leveraging the Masked Diffusion Transformer (MDT) for improved handling of both global garment context and fine-grained details.
no code implementations • 13 Feb 2025 • Trung X. Pham, Zhang Kang, Ji Woo Hong, Xuran Zheng, Chang D. Yoo
We propose E-MD3C ($\underline{E}$fficient $\underline{M}$asked $\underline{D}$iffusion Transformer with Disentangled $\underline{C}$onditions and $\underline{C}$ompact $\underline{C}$ollector), a highly efficient framework for zero-shot object image customization.
no code implementations • 16 Aug 2024 • Tri Ton, Ji Woo Hong, SooHwan Eom, Jun Yeop Shim, Junyeong Kim, Chang D. Yoo
3D pathway generates spatially accurate class-agnostic mask proposals of common indoor objects from 3D point cloud data using a pre-trained 3D model, while 2D pathway utilizes pre-trained open-vocabulary instance segmentation model to identify a diverse array of object proposals from multi-view RGB-D images.
3D Instance Segmentation
open vocabulary 3d instance segmentation
+1
1 code implementation • 25 Jul 2024 • Gwanhyeong Koo, Sunjae Yoon, Ji Woo Hong, Chang D. Yoo
Current image editing methods primarily utilize DDIM Inversion, employing a two-branch diffusion approach to preserve the attributes and layout of the original image.
no code implementations • 10 Dec 2023 • Sunjae Yoon, Gwanhyeong Koo, Ji Woo Hong, Chang D. Yoo
To this end, this paper proposes Neutral Editing (NeuEdit) framework to enable complex non-rigid editing by changing the motion of a person/object in a video, which has never been attempted before.
no code implementations • 17 Nov 2022 • Trung X. Pham, Axi Niu, Zhang Kang, Sultan Rizky Madjid, Ji Woo Hong, Daehyeok Kim, Joshua Tian Jin Tee, Chang D. Yoo
To solve this problem, we propose "residual momentum" to directly reduce this gap to encourage the student to learn the representation as close to that of the teacher as possible, narrow the performance gap with the teacher, and significantly improve the existing SSL.
1 code implementation • 17 Oct 2022 • Sunjae Yoon, Ji Woo Hong, Eunseop Yoon, Dahyun Kim, Junyeong Kim, Hee Suk Yoon, Chang D. Yoo
Video moment retrieval (VMR) aims to localize target moments in untrimmed videos pertinent to a given textual query.