no code implementations • 26 Mar 2025 • Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Jiayi Ji, Jie Lou, Debing Zhang, Rongrong Ji
Visual instruction tuning (VIT) has emerged as a crucial technique for enabling multi-modal large language models (MLLMs) to follow user instructions adeptly.
no code implementations • 5 Mar 2025 • Zichao Li, Xueru Wen, Jie Lou, Yuqiu Ji, Yaojie Lu, Xianpei Han, Debing Zhang, Le Sun
Multimodal Reward Models (MM-RMs) are crucial for aligning Large Language Models (LLMs) with human preferences, particularly as LLMs increasingly interact with multimodal data.
no code implementations • 24 Feb 2025 • Xueru Wen, Jie Lou, Zichao Li, Yaojie Lu, Xing Yu, Yuqiu Ji, Guohai Xu, Hongyu Lin, Ben He, Xianpei Han, Le Sun, Debing Zhang
Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences.
no code implementations • 7 Feb 2025 • Xueru Wen, Jie Lou, Xinyu Lu, Junjie Yang, Yanjiang Liu, Yaojie Lu, Debing Zhang, Xingyu
As AI capabilities increasingly surpass human proficiency in complex tasks, current alignment techniques including SFT and RLHF face fundamental challenges in ensuring reliable oversight.
1 code implementation • 26 Jan 2025 • Zhiyuan Fan, Weinong Wang, Xing Wu, Debing Zhang
Using the same training data, our evaluator LM achieves a higher concordance rate with human grading results than other paradigms, including GPT-4, highlighting the superiority and efficiency of our approach.
1 code implementation • 22 Jan 2025 • Chaochen Gao, Xing Wu, Zijia Lin, Debing Zhang, Songlin Hu
Large language models (LLMs) with extended context windows have made significant strides yet remain a challenge due to the scarcity of long documents.
no code implementations • 20 Jan 2025 • Haotian Xu, Xing Wu, Weinong Wang, Zhongzhi Li, Da Zheng, Boyuan Chen, Yi Hu, Shijia Kang, Jiaming Ji, Yingying Zhang, Zhijiang Guo, Yaodong Yang, Muhan Zhang, Debing Zhang
In this work, we explore the untapped potential of scaling Long Chain-of-Thought (Long-CoT) data to 1000k samples, pioneering the development of a slow-thinking model, RedStar.
1 code implementation • 28 Oct 2024 • Weijian Luo, Colin Zhang, Debing Zhang, Zhengyang Geng
In this paper, we introduce the Diff-Instruct* (DI*), an image data-free approach for building one-step text-to-image generative models that align with human preference while maintaining the ability to generate highly realistic images.
no code implementations • 8 Oct 2024 • Xueru Wen, Jie Lou, Yaojie Lu, Hongyu Lin, Xing Yu, Xinyu Lu, Ben He, Xianpei Han, Debing Zhang, Le Sun
Reward Models (RMs) are crucial for aligning language models with human preferences.
no code implementations • 3 Oct 2024 • Huimu Yu, Xing Wu, Weidong Yin, Debing Zhang, Songlin Hu
Large language models (LLMs) have made significant progress in natural language understanding and generation, driven by scalable pretraining and advanced finetuning.
no code implementations • 7 Sep 2024 • Junfeng Tian, Da Zheng, Yang Cheng, Rui Wang, Colin Zhang, Debing Zhang
Large language models (LLM) have prioritized expanding the context window from which models can incorporate more information.
no code implementations • 29 Aug 2024 • Xin Zheng, Jie Lou, Boxi Cao, Xueru Wen, Yuqiu Ji, Hongyu Lin, Yaojie Lu, Xianpei Han, Debing Zhang, Le Sun
Self-critic has become a crucial mechanism for enhancing the reasoning performance of LLMs.
1 code implementation • ICCV 2023 • Yingping Liang, Jiaming Liu, Debing Zhang, Ying Fu
The accuracy of learning-based optical flow estimation models heavily relies on the realism of the training datasets.
no code implementations • 7 Aug 2023 • Zichun Wang, Yulun Zhang, Debing Zhang, Ying Fu
However, under their blind spot constraints, previous self-supervised video denoising methods suffer from significant information loss and texture destruction in either the whole reference frame or neighbor frames, due to their inadequate consideration of the receptive field.
1 code implementation • Submitted to ICLR 2022 • Wentao Zhu, Yufang Huang, Xiufeng Xie, Wenxian Liu, Jincan Deng, Debing Zhang, Zhangyang Wang, Ji Liu
For video content creation and understanding, the shot boundary detection (SBD) is one of the most essential components in various scenarios.
Ranked #1 on
Camera shot boundary detection
on ClipShots
no code implementations • ICCV 2023 • Yangyi Huang, Hongwei Yi, Weiyang Liu, Haofan Wang, Boxi Wu, Wenxiao Wang, Binbin Lin, Debing Zhang, Deng Cai
Most of these methods fail to achieve realistic reconstruction when only a single image is available.
1 code implementation • 20 Mar 2022 • Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo
Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.
3 code implementations • 9 Dec 2021 • Weijia Wu, Yuanqiang Cai, Debing Zhang, Sibo Wang, Zhuang Li, Jiahong Li, Yejun Tang, Hong Zhou
Most existing video text spotting benchmarks focus on evaluating a single language and scenario with limited data.
no code implementations • 30 Oct 2021 • Jue Wang, Haofan Wang, Xing Wu, Chaochen Gao, Debing Zhang
In this paper, we present TransAug (Translate as Augmentation), which provide the first exploration of utilizing translated sentence pairs as data augmentation for text, and introduce a two-stage paradigm to advances the state-of-the-art sentence embeddings.
no code implementations • 10 Sep 2021 • Jue Wang, Haofan Wang, Jincan Deng, Weijia Wu, Debing Zhang
Extra rich non-paired single-modal text data is used for boosting the generalization of text branch.
7 code implementations • 11 Oct 2020 • Xiang An, Xuhan Zhu, Yang Xiao, Lan Wu, Ming Zhang, Yuan Gao, Bin Qin, Debing Zhang, Ying Fu
The experiment demonstrates no loss of accuracy when training with only 10\% randomly sampled classes for the softmax-based loss functions, compared with training with full classes using state-of-the-art models on mainstream benchmarks.
Ranked #2 on
Face Identification
on MegaFace
1 code implementation • 30 Jun 2020 • Di Wu, Qi Tang, Yongle Zhao, Ming Zhang, Ying Fu, Debing Zhang
The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications.
no code implementations • ECCV 2018 • Ying Fu, Tao Zhang, Yinqiang Zheng, Debing Zhang, Hua Huang
Hyperspectral image (HSI) recovery from a single RGB image has attracted much attention, whose performance has recently been shown to be sensitive to the camera spectral sensitivity (CSS).