1 code implementation • 13 Jun 2024 • Jaeseok Byun, Seokhyeon Jeong, Wonjae Kim, Sanghyuk Chun, Taesup Moon
However, we highlight an inherent limitation in these projection-based CIR: a task discrepancy of text encoders between the original pre-training task of the encoders (text $\leftrightarrow$ image) and the target CIR task (image + text $\leftrightarrow$ image), which potentially negatively impacts CIR performance.
1 code implementation • CVPR 2024 • Jaeseok Byun, Dohoon Kim, Taesup Moon
We consider a critical issue of false negatives in Vision-Language Pre-training (VLP), a challenge that arises from the inherent many-to-many correspondence of image-text pairs in large-scale web-crawled datasets.
1 code implementation • 8 Aug 2022 • Jaeseok Byun, Taebaek Hwang, Jianlong Fu, Taesup Moon
In contrast to the mainstream VLP methods, we highlight that two routinely applied steps during pre-training have crucial impact on the performance of the pre-trained model: in-batch hard negative sampling for image-text matching (ITM) and assigning the large masking probability for the masked language modeling (MLM).
1 code implementation • CVPR 2021 • Jaeseok Byun, Sungmin Cha, Taesup Moon
To that end, we propose Fast Blind Image Denoiser (FBI-Denoiser) for Poisson-Gaussian noise, which consists of two neural network models; 1) PGE-Net that estimates Poisson-Gaussian noise parameters 2000 times faster than the conventional methods and 2) FBI-Net that realizes a much more efficient BSN for pixelwise affine denoiser in terms of the number of parameters and inference speed.