no code implementations • NLP4ConvAI (ACL) 2022 • Zhiqi Huang, Milind Rao, Anirudh Raju, Zhe Zhang, Bach Bui, Chul Lee
The proposed framework benefits from three key aspects: 1) pre-trained sub-networks of ASR model and language model; 2) multi-task learning objective to exploit shared knowledge from different tasks; 3) end-to-end training of ASR and downstream NLP task based on sequence loss.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • ECCV 2020 • Junheum Park, Keunsoo Ko, Chul Lee, Chang-Su Kim
We propose a novel deep-learning-based video interpolation algorithm based on bilateral motion estimation.
Ranked #15 on
Video Frame Interpolation
on MSU Video Frame Interpolation
(PSNR metric)
no code implementations • 14 Dec 2022 • Atiyo Ghosh, Antonio A. Gentile, Mario Dagrada, Chul Lee, Seong-hyok Kim, Hyukgeun Cha, Yunjun Choi, Brad Kim, Jeong-il Kye, Vincent E. Elfving
In this work, we derive exactly-harmonic (conventional- and quantum-) neural networks in two dimensions for simply-connected domains by leveraging the characteristics of holomorphic complex functions.
1 code implementation • European Conference on Computer Vision (ECCV) 2022 • An Gia Vien, Chul Lee
We propose a novel single-shot high dynamic range (HDR) imaging algorithm based on exposure-aware dynamic weighted learning, which reconstructs an HDR image from a spatially varying exposure (SVE) raw image.
1 code implementation • 23 Aug 2022 • Jinyoung Jun, Jae-Han Lee, Chul Lee, Chang-Su Kim
We propose a novel algorithm for monocular depth estimation that decomposes a metric depth map into a normalized depth map and scale features.
Ranked #17 on
Monocular Depth Estimation
on NYU-Depth V2
(using extra training data)
no code implementations • 23 Jun 2022 • Jinmiao Huang, Waseem Gharbieh, Qianhui Wan, Han Suk Shim, Chul Lee
Current keyword spotting systems are typically trained with a large amount of pre-defined keywords.
no code implementations • 25 May 2022 • Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, Jin Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang, Javen Qinfeng Shi, Dong Gong, Dan Zhu, Mengdi Sun, Guannan Chen, Yang Hu, Haowei Li, Baozhu Zou, Zhen Liu, Wenjie Lin, Ting Jiang, Chengzhi Jiang, Xinpeng Li, Mingyan Han, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Juan Marín-Vega, Michael Sloth, Peter Schneider-Kamp, Richard Röttger, Chunyang Li, Long Bao, Gang He, Ziyao Xu, Li Xu, Gen Zhan, Ming Sun, Xing Wen, Junlin Li, Shuang Feng, Fei Lei, Rui Liu, Junxiang Ruan, Tianhong Dai, Wei Li, Zhan Lu, Hengyan Liu, Peian Huang, Guangyu Ren, Yonglin Luo, Chang Liu, Qiang Tu, Fangya Li, Ruipeng Gang, Chenghua Li, Jinjing Li, Sai Ma, Chenming Liu, Yizhen Cao, Steven Tel, Barthelemy Heyrman, Dominique Ginhac, Chul Lee, Gahyeon Kim, Seonghyun Park, An Gia Vien, Truong Thanh Nhat Mai, Howoon Yoon, Tu Vo, Alexander Holston, Sheir Zaheer, Chan Y. Park
The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i. e. solutions can not exceed a given number of operations).
no code implementations • 24 Feb 2022 • Kishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee
Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics.
1 code implementation • ICCV 2021 • Junheum Park, Chul Lee, Chang-Su Kim
First, we predict symmetric bilateral motion fields to interpolate an anchor frame.
Ranked #6 on
Video Frame Interpolation
on MSU Video Frame Interpolation
(VMAF metric)
no code implementations • 30 Jun 2021 • Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow
Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 14 Jun 2021 • Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke
We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech.
1 code implementation • ICCV 2021 • Jae-Han Lee, Chul Lee, Chang-Su Kim
We propose a novel loss weighting algorithm, called loss scale balancing (LSB), for multi-task learning (MTL) of pixelwise vision tasks.
no code implementations • 5 Nov 2020 • Eunjung Han, Chul Lee, Andreas Stolcke
We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers.
1 code implementation • 17 Jul 2020 • Junheum Park, Keunsoo Ko, Chul Lee, Chang-Su Kim
We propose a novel deep-learning-based video interpolation algorithm based on bilateral motion estimation.
Ranked #3 on
Video Frame Interpolation
on Middlebury
no code implementations • 6 May 2020 • Shanxin Yuan, Radu Timofte, Ales Leonardis, Gregory Slabaugh, Xiaotong Luo, Jiangtao Zhang, Yanyun Qu, Ming Hong, Yuan Xie, Cuihua Li, Dejia Xu, Yihao Chu, Qingyan Sun, Shuai Liu, Ziyao Zong, Nan Nan, Chenghua Li, Sangmin Kim, Hyungjoon Nam, Jisu Kim, Jechang Jeong, Manri Cheon, Sung-Jun Yoon, Byungyeon Kang, Junwoo Lee, Bolun Zheng, Xiaohong Liu, Linhui Dai, Jun Chen, Xi Cheng, Zhen-Yong Fu, Jian Yang, Chul Lee, An Gia Vien, Hyunkook Park, Sabari Nathan, M. Parisa Beham, S Mohamed Mansoor Roomi, Florian Lemarchand, Maxime Pelcat, Erwan Nogues, Densen Puthussery, Hrishikesh P. S, Jiji C. V, Ashish Sinha, Xuan Zhao
Track 1 targeted the single image demoireing problem, which seeks to remove moire patterns from a single image.
no code implementations • 7 Mar 2020 • Palash Goyal, Saurabh Sahu, Shalini Ghosh, Chul Lee
Multi-modal machine learning (ML) models can process data in multiple modalities (e. g., video, audio, text) and are useful for video content analysis in a variety of problems (e. g., object detection, scene understanding, activity recognition).
no code implementations • 7 Feb 2020 • Palash Goyal, Saurabh Sahu, Shalini Ghosh, Chul Lee
Multimodal ML models can process data in multiple modalities (e. g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e. g., object detection, scene understanding).
1 code implementation • ICCV 2017 • Jun-Tae Lee, Han-Ul Kim, Chul Lee, Chang-Su Kim
Then, we develop the line pooling layer to extract a feature vector for each candidate line from the feature maps.
Ranked #3 on
Line Detection
on SEL
no code implementations • 8 Dec 2016 • Yuelong Li, Chul Lee, Vishal Monga
For HDR video, a stiff practical challenge presents itself in the form of accurate correspondence estimation of objects between video frames.