no code implementations • 8 Apr 2025 • Tri Ton, Ji Woo Hong, Chang D. Yoo
This paper introduces Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning (TARO), a novel framework for high-fidelity and temporally coherent video-to-audio synthesis.
no code implementations • CVPR 2025 • Ji Woo Hong, Tri Ton, Trung X. Pham, Gwanhyeong Koo, Sunjae Yoon, Chang D. Yoo
This paper introduces ITA-MDT, the Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On (IVTON), designed to overcome the limitations of previous approaches by leveraging the Masked Diffusion Transformer (MDT) for improved handling of both global garment context and fine-grained details.
no code implementations • 3 Oct 2024 • Trung X. Pham, Tri Ton, Chang D. Yoo
We introduce MDSGen, a novel framework for vision-guided open-domain sound generation optimized for model parameter size, memory consumption, and inference speed.
no code implementations • 16 Aug 2024 • Tri Ton, Ji Woo Hong, SooHwan Eom, Jun Yeop Shim, Junyeong Kim, Chang D. Yoo
3D pathway generates spatially accurate class-agnostic mask proposals of common indoor objects from 3D point cloud data using a pre-trained 3D model, while 2D pathway utilizes pre-trained open-vocabulary instance segmentation model to identify a diverse array of object proposals from multi-view RGB-D images.
3D Instance Segmentation
open vocabulary 3d instance segmentation
+1
1 code implementation • 31 Jul 2024 • Tung M. Luu, Haeyong Kang, Tri Ton, Thanh Nguyen, Chang D. Yoo
Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment.
no code implementations • 18 May 2024 • Thanh Nguyen, Tung M. Luu, Tri Ton, Chang D. Yoo
This paper proposes a framework to enhance the robustness of offline RL models by leveraging advanced adversarial attacks and defenses.