no code implementations • 16 Apr 2025 • Subin Kim, Hoonrae Kim, Jihyun Lee, Yejin Jeon, Gary Geunbae Lee
Recent studies have explored the use of large language models (LLMs) in psychotherapy; however, text-based cognitive behavioral therapy (CBT) models often struggle with client resistance, which can weaken therapeutic alliance.
no code implementations • 7 Apr 2025 • Jihyun Lee, Weipeng Xu, Alexander Richard, Shih-En Wei, Shunsuke Saito, Shaojie Bai, Te-Li Wang, Minhyuk Sung, Tae-Kyun Kim, Jason Saragih
To enable real-time inference, we introduce (1) cascaded body-hand denoising diffusion, which effectively models the correlation between egocentric body and hand motions in a fast, feed-forward manner, and (2) diffusion distillation, which enables high-quality motion estimation with a single denoising step.
no code implementations • 28 Mar 2025 • Yunhong Min, Daehyeon Choi, Kyeongmin Yeo, Jihyun Lee, Minhyuk Sung
We introduce ORIGEN, the first zero-shot method for 3D orientation grounding in text-to-image generation across multiple objects and diverse categories.
no code implementations • 10 Sep 2024 • Jihyun Lee, Solee Im, Wonjun Lee, Gary Geunbae Lee
Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 10 Sep 2024 • Jihyun Lee, Gary Geunbae Lee
Traditional dialogue state tracking approaches heavily rely on extensive training data and handcrafted features, limiting their scalability and adaptability to new domains.
no code implementations • 6 Sep 2024 • Woojin Cho, Jihyun Lee, Minjae Yi, Minje Kim, Taeyun Woo, Donghwan Kim, Taewook Ha, Hyokeun Lee, Je-Hwan Ryu, Woontack Woo, Tae-Kyun Kim
Accurate hand and object 3D meshes are obtained by fitting the hand parametric model (MANO) and the hand implicit function (HALO) to multi-view RGBD frames, with the MoCap system only for objects.
no code implementations • 29 Aug 2024 • JiHwan Kim, Miso Lee, Cheol-Ho Cho, Jihyun Lee, Jae-Pil Heo
Temporal Action Detection (TAD) is fundamental yet challenging for real-world video applications.
no code implementations • CVPR 2024 • Jihyun Lee, Shunsuke Saito, Giljoo Nam, Minhyuk Sung, Tae-Kyun Kim
Sampling from our model yields plausible and diverse two-hand shapes in close interaction with or without an object.
no code implementations • 21 Dec 2023 • Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang
Specifically, we train an encoder module to map ECoG signals to latent embeddings that match Wav2Vec 2. 0 representations of the corresponding spoken speech.
no code implementations • 21 Dec 2023 • Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang
In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation.
1 code implementation • 4 Dec 2023 • Jihyun Lee, Yejin Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee
We address this by investigating synthetic audio data for audio-based DST.
no code implementations • 17 Mar 2023 • Jihyun Lee, Seungyeon Seo, Yunsu Kim, Gary Geunbae Lee
We present our work on Track 2 in the Dialog System Technology Challenges 11 (DSTC11).
1 code implementation • CVPR 2023 • Jihyun Lee, Minhyuk Sung, Honggyu Choi, Tae-Kyun Kim
To handle the shape complexity and interaction context between two hands, Im2Hands models the occupancy volume of two hands - conditioned on an RGB image and coarse 3D keypoints - by two novel attention-based modules responsible for (1) initial occupancy estimation and (2) context-aware occupancy refinement, respectively.
no code implementations • 17 Nov 2022 • Jihyun Lee, Chaebin Lee, Yunsu Kim, Gary Geunbae Lee
In dialogue state tracking (DST), labeling the dataset involves considerable human labor.
no code implementations • 16 Sep 2022 • Jihyun Lee, Gary Geunbae Lee
Few-shot dialogue state tracking (DST) model tracks user requests in dialogue with reliable accuracy even with a small amount of data.
no code implementations • CVPR 2022 • Jihyun Lee, Minhyuk Sung, HyunJin Kim, Tae-Kyun Kim
We propose a framework that can deform an object in a 2D image as it exists in 3D space.
no code implementations • 26 Jul 2021 • Se-Yun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang
In this paper, we propose a multi-speaker face-to-speech waveform generation model that also works for unseen speaker conditions.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 1 Apr 2021 • Minsu Kang, Jihyun Lee, Simin Kim, Injung Kim
We propose an end-to-end speech synthesizer, Fast DCTTS, that synthesizes speech in real time on a single CPU thread.
no code implementations • 1 Jan 2021 • Tae Gyoon Kang, Ho-Gyeong Kim, Min-Joong Lee, Jihyun Lee, Seongmin Ok, Hoshik Lee, Young Sang Choi
Transformers with soft attention have been widely adopted to various sequence-to-sequence tasks.