1 code implementation • CVPR 2023 • Jihyun Lee, Minhyuk Sung, Honggyu Choi, Tae-Kyun Kim
To handle the shape complexity and interaction context between two hands, Im2Hands models the occupancy volume of two hands - conditioned on an RGB image and coarse 3D keypoints - by two novel attention-based modules responsible for (1) initial occupancy estimation and (2) context-aware occupancy refinement, respectively.
1 code implementation • 4 Dec 2023 • Jihyun Lee, Yejin Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee
We address this by investigating synthetic audio data for audio-based DST.
no code implementations • 1 Jan 2021 • Tae Gyoon Kang, Ho-Gyeong Kim, Min-Joong Lee, Jihyun Lee, Seongmin Ok, Hoshik Lee, Young Sang Choi
Transformers with soft attention have been widely adopted to various sequence-to-sequence tasks.
no code implementations • 1 Apr 2021 • Minsu Kang, Jihyun Lee, Simin Kim, Injung Kim
We propose an end-to-end speech synthesizer, Fast DCTTS, that synthesizes speech in real time on a single CPU thread.
no code implementations • 26 Jul 2021 • Se-Yun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang
In this paper, we propose a multi-speaker face-to-speech waveform generation model that also works for unseen speaker conditions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • CVPR 2022 • Jihyun Lee, Minhyuk Sung, HyunJin Kim, Tae-Kyun Kim
We propose a framework that can deform an object in a 2D image as it exists in 3D space.
no code implementations • 16 Sep 2022 • Jihyun Lee, Gary Geunbae Lee
Few-shot dialogue state tracking (DST) model tracks user requests in dialogue with reliable accuracy even with a small amount of data.
no code implementations • 17 Nov 2022 • Jihyun Lee, Chaebin Lee, Yunsu Kim, Gary Geunbae Lee
In dialogue state tracking (DST), labeling the dataset involves considerable human labor.
no code implementations • 17 Mar 2023 • Jihyun Lee, Seungyeon Seo, Yunsu Kim, Gary Geunbae Lee
We present our work on Track 2 in the Dialog System Technology Challenges 11 (DSTC11).
no code implementations • 21 Dec 2023 • Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang
Specifically, we train an encoder module to map ECoG signals to latent embeddings that match Wav2Vec 2. 0 representations of the corresponding spoken speech.
no code implementations • 21 Dec 2023 • Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang
In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation.
no code implementations • 26 Mar 2024 • Jihyun Lee, Shunsuke Saito, Giljoo Nam, Minhyuk Sung, Tae-Kyun Kim
Sampling from our model yields plausible and diverse two-hand shapes in close interaction with or without an object.