Search Results for author: Chul Lee

Found 25 papers, 10 papers with code

MTL-SLT: Multi-Task Learning for Spoken Language Tasks

no code implementations NLP4ConvAI (ACL) 2022 Zhiqi Huang, Milind Rao, Anirudh Raju, Zhe Zhang, Bach Bui, Chul Lee

The proposed framework benefits from three key aspects: 1) pre-trained sub-networks of ASR model and language model; 2) multi-task learning objective to exploit shared knowledge from different tasks; 3) end-to-end training of ASR and downstream NLP task based on sequence loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow

no code implementations15 Apr 2024 Felix Taubner, Prashant Raina, Mathieu Tuli, Eu Wern Teh, Chul Lee, Jinmiao Huang

Because such methods are expensive and due to the widespread availability of 2D videos, recent methods have focused on how to perform monocular 3D face tracking.

Disentanglement Face Model

H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

1 code implementation13 Feb 2024 Minyoung Park, Mirae Do, YeonJae Shin, Jaeseok Yoo, Jongkwang Hong, Joongrock Kim, Chul Lee

Advanced techniques using Neural Radiance Fields (NeRF), Signed Distance Fields (SDF), and Occupancy Fields have recently emerged as solutions for 3D indoor scene reconstruction.

Indoor Scene Reconstruction Object

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

no code implementations ICCV 2023 Hyeonseop Song, Seokhun Choi, Hoseok Do, Chul Lee, Taehyeong Kim

Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process.


Cross-modal transformers for infrared and visible image fusion

1 code implementation IEEE Transactions on Circuits and Systems for Video Technology 2023 Seonghyun Park, An Gia Vien, Chul Lee

In this work, we propose a cross-modal transformer-based fusion (CMTFusion) algorithm for infrared and visible image fusion that captures global interactions by faithfully extracting complementary information from source images.

Cross-Modal Retrieval Infrared And Visible Image Fusion +3

Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis

no code implementations CVPR 2023 Hoseok Do, EunKyung Yoo, Taehyeong Kim, Chul Lee, Jin Young Choi

While 3D-based GAN techniques have been successfully applied to render photo-realistic 3D images with a variety of attributes while preserving view consistency, there has been little research on how to fine-control 3D images without limiting to a specific category of objects of their properties.

3D-Aware Image Synthesis Attribute +1

Harmonic (Quantum) Neural Networks

no code implementations14 Dec 2022 Atiyo Ghosh, Antonio A. Gentile, Mario Dagrada, Chul Lee, Seong-hyok Kim, Hyukgeun Cha, Yunjun Choi, Brad Kim, Jeong-il Kye, Vincent E. Elfving

Harmonic functions are abundant in nature, appearing in limiting cases of Maxwell's, Navier-Stokes equations, the heat and the wave equation.

Inductive Bias Quantum Machine Learning +1

Exposure-Aware Dynamic Weighted Learning for Single-Shot HDR Imaging

1 code implementation European Conference on Computer Vision (ECCV) 2022 An Gia Vien, Chul Lee

We propose a novel single-shot high dynamic range (HDR) imaging algorithm based on exposure-aware dynamic weighted learning, which reconstructs an HDR image from a spatially varying exposure (SVE) raw image.

HDR Reconstruction Image Reconstruction +2

Depth Map Decomposition for Monocular Depth Estimation

1 code implementation23 Aug 2022 Jinyoung Jun, Jae-Han Lee, Chul Lee, Chang-Su Kim

We propose a novel algorithm for monocular depth estimation that decomposes a metric depth map into a normalized depth map and scale features.

Ranked #38 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Monocular Depth Estimation

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

no code implementations25 May 2022 Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, Jin Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang, Javen Qinfeng Shi, Dong Gong, Dan Zhu, Mengdi Sun, Guannan Chen, Yang Hu, Haowei Li, Baozhu Zou, Zhen Liu, Wenjie Lin, Ting Jiang, Chengzhi Jiang, Xinpeng Li, Mingyan Han, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Juan Marín-Vega, Michael Sloth, Peter Schneider-Kamp, Richard Röttger, Chunyang Li, Long Bao, Gang He, Ziyao Xu, Li Xu, Gen Zhan, Ming Sun, Xing Wen, Junlin Li, Shuang Feng, Fei Lei, Rui Liu, Junxiang Ruan, Tianhong Dai, Wei Li, Zhan Lu, Hengyan Liu, Peian Huang, Guangyu Ren, Yonglin Luo, Chang Liu, Qiang Tu, Fangya Li, Ruipeng Gang, Chenghua Li, Jinjing Li, Sai Ma, Chenming Liu, Yizhen Cao, Steven Tel, Barthelemy Heyrman, Dominique Ginhac, Chul Lee, Gahyeon Kim, Seonghyun Park, An Gia Vien, Truong Thanh Nhat Mai, Howoon Yoon, Tu Vo, Alexander Holston, Sheir Zaheer, Chan Y. Park

The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i. e. solutions can not exceed a given number of operations).

Image Restoration Vocal Bursts Intensity Prediction

openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer

no code implementations24 Feb 2022 Kishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee

Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics.

Open Set Learning Speaker Identification

On joint training with interfaces for spoken language understanding

no code implementations30 Jun 2021 Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow

Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

End-to-end Neural Diarization: From Transformer to Conformer

no code implementations14 Jun 2021 Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke

We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech.

Data Augmentation

Learning Multiple Pixelwise Tasks Based on Loss Scale Balancing

1 code implementation ICCV 2021 Jae-Han Lee, Chul Lee, Chang-Su Kim

We propose a novel loss weighting algorithm, called loss scale balancing (LSB), for multi-task learning (MTL) of pixelwise vision tasks.

Multi-Task Learning

BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

no code implementations5 Nov 2020 Eunjung Han, Chul Lee, Andreas Stolcke

We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers.

Clustering speaker-diarization +1

Cross-modal Learning for Multi-modal Video Categorization

no code implementations7 Mar 2020 Palash Goyal, Saurabh Sahu, Shalini Ghosh, Chul Lee

Multi-modal machine learning (ML) models can process data in multiple modalities (e. g., video, audio, text) and are useful for video content analysis in a variety of problems (e. g., object detection, scene understanding, activity recognition).

Activity Recognition object-detection +2

Exploiting Temporal Coherence for Multi-modal Video Categorization

no code implementations7 Feb 2020 Palash Goyal, Saurabh Sahu, Shalini Ghosh, Chul Lee

Multimodal ML models can process data in multiple modalities (e. g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e. g., object detection, scene understanding).

object-detection Object Detection +1

Semantic Line Detection and Its Applications

1 code implementation ICCV 2017 Jun-Tae Lee, Han-Ul Kim, Chul Lee, Chang-Su Kim

Then, we develop the line pooling layer to extract a feature vector for each candidate line from the feature maps.

Classification General Classification +4

A Maximum A Posteriori Estimation Framework for Robust High Dynamic Range Video Synthesis

no code implementations8 Dec 2016 Yuelong Li, Chul Lee, Vishal Monga

For HDR video, a stiff practical challenge presents itself in the form of accurate correspondence estimation of objects between video frames.

Image Generation Optical Flow Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.