no code implementations • SemEval (NAACL) 2022 • Jing Zhang, Yujin Wang
Online misogyny meme detection is an image/text multimodal classification task, the complicated relation of image and text challenges the intelligent system’s modality fusion learning capability.
no code implementations • 10 Apr 2025 • Yujin Wang, Jiarui Wu, Yichen Bian, Fan Zhang, Tianfan Xue
The generalization of learning-based high dynamic range (HDR) fusion is often limited by the availability of training data, as collecting large-scale HDR images from dynamic scenes is both costly and technically challenging.
1 code implementation • 26 Mar 2025 • Ziran Zhang, Xiaohui Li, Yihao Liu, Yujin Wang, Yueting Chen, Tianfan Xue, Shi Guo
Video frame interpolation (VFI) in scenarios with large motion remains challenging due to motion ambiguity between frames.
no code implementations • 18 Mar 2025 • Yujin Wang, Quanfeng Liu, Zhengxin Jiang, Tianyi Wang, Junfeng Jiao, Hongqing Chu, Bingzhao Gao, Hong Chen
Additionally, we fine-tune VLMs on a specifically curated dataset derived from the NuScenes dataset to enhance their spatial perception and bird's-eye view image comprehension capabilities.
no code implementations • 10 Mar 2025 • Jiarui Wu, Yujin Wang, Lingen Li, Zhang Fan, Tianfan Xue
To overcome these challenges, we propose a novel goal-conditioned reinforcement learning framework for efficiently tuning parameters using a goal image as a condition.
no code implementations • 18 Feb 2025 • Xin Xia, Yujin Wang, Jun Zhou, Guisheng Zhong, Linning Cai, Chen Zhang
In this paper, we introduce KAPPA, an integrated framework designed to construct keyphrase-based patent portraits and enhance patent analysis.
no code implementations • CVPR 2025 • Zixuan Chen, Yujin Wang, Xin Cai, Zhiyuan You, Zheming Lu, Fan Zhang, Shi Guo, Tianfan Xue
In this work, we propose UltraFusion, the first exposure fusion technique that can merge input with 9 stops differences.
no code implementations • 15 Dec 2024 • Yujin Wang, Quanfeng Liu, Jiaqi Fan, Jinlong Hong, Hongqing Chu, Mengjian Tian, Bingzhao Gao, Hong Chen
We evaluate RAC3 through extensive experiments using a curated dataset of corner case scenarios, demonstrating its ability to enhance semantic alignment, improve hallucination mitigation, and achieve superior performance metrics, such as Cosine Similarity and ROUGE-L scores.
no code implementations • 30 Oct 2024 • Yujin Wang, Tianyi Xu, Fan Zhang, Tianfan Xue, Jinwei Gu
Based on this, AdaptiveISP utilizes deep reinforcement learning to automatically generate an optimal ISP pipeline and the associated ISP parameters to maximize the detection performance.
1 code implementation • 27 Sep 2024 • Ruikang Li, Yujin Wang, Shiqi Chen, Fan Zhang, Jinwei Gu, Tianfan Xue
The raw domain denoising adapts to sensor-specific noise as well as spatially varying noise levels, while the sRGB domain denoising adapts to ISP variations and removes residual noise amplified by the ISP.
Ranked #1 on
Image Denoising
on DND
no code implementations • CVPR 2024 • Gangwei Xu, Yujin Wang, Jinwei Gu, Tianfan Xue, Xin Yang
HDRFlow has three novel designs: an HDR-domain alignment loss (HALoss), an efficient flow network with a multi-size large kernel (MLK), and a new HDR flow training scheme.
no code implementations • 19 Sep 2023 • Yujin Wang, Lingen Li, Tianfan Xue, Jinwei Gu
To address the trade-off between visual appeal and fidelity of high-frequency details in denoising tasks, we propose a novel approach called the Reconstruct-and-Generate Diffusion Model (RnG).
no code implementations • 5 Sep 2023 • TaeHoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh, Jonghwan Mun, Solgil Oh, Kenan Emir Ak, Gwang-Gook Lee, Yan Xu, Mingwei Shen, Kyomin Hwang, Wonsik Shin, Kamin Lee, Wonhark Park, Dongkwan Lee, Nojun Kwak, Yujin Wang, Yimu Wang, Tiancheng Gu, Xingchang Lv, Mingmao Sun
In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge.
no code implementations • 18 Feb 2023 • Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng
However, the training of SSL models is computationally expensive and a common practice is to fine-tune a released SSL model on the specific task.
1 code implementation • 14 Nov 2022 • Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen
In this paper, we provide a new perspective on self-supervised speech models from how the training targets are obtained.
Ranked #46 on
Speech Recognition
on LibriSpeech test-other
no code implementations • 27 Oct 2022 • Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang
Recent years have witnessed great strides in self-supervised learning (SSL) on the speech processing.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2