1 code implementation • 30 Mar 2025 • Yuhang Yang, Ke Fan, Shangkun Sun, Hongxiang Li, Ailing Zeng, Feilin Han, Wei Zhai, Wei Liu, Yang Cao, Zheng-Jun Zha
The rapid advancement of video generation has rendered existing evaluation systems inadequate for assessing state-of-the-art models, primarily due to simple prompts that cannot showcase the model's capabilities, fixed evaluation operators struggling with Out-of-Distribution (OOD) cases, and misalignment between computed metrics and human preferences.
no code implementations • 17 Mar 2025 • Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou
Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools.
1 code implementation • 12 Dec 2024 • Xiaoshuang Huang, Lingdong Shen, Jia Liu, Fangxin Shang, Hongxiang Li, Haifeng Huang, Yehui Yang
In this paper, we introduce a novel end-to-end multimodal large language model for the biomedical domain, named MedPLIB, which possesses pixel-level understanding.
1 code implementation • 12 Dec 2024 • Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen
Specifically, we generate a dense motion field from a sparse motion field and the reference image, which provides region-level dense guidance while maintaining the generalization of the sparse pose control.
no code implementations • 26 Aug 2024 • Omar Alnaseri, Ibtesam R. K. Al-Saedi, Yassine Himeur, Hongxiang Li
In particular, the AE model achieves the best BER performance of \(2 \times 10^{-6}\) at 44 dB OSNR, surpassing traditional methods.
no code implementations • 31 May 2024 • Yuanjiang Luo, Hongxiang Li, Xuan Wu, Meng Cao, Xiaoshuang Huang, Zhihong Zhu, Peixi Liao, Hu Chen, Yi Zhang
Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports.
no code implementations • 31 May 2024 • Xuxin Cheng, Wanshi Xu, Zhihong Zhu, Hongxiang Li, Yuexian Zou
SLU usually consists of two subtasks, including intent detection and slot filling.
no code implementations • 30 May 2024 • Xuan Wu, Hongxiang Li, Yuanjiang Luo, Xuxin Cheng, Xianwei Zhuang, Meng Cao, Keren Fu
Sign language video retrieval plays a key role in facilitating information access for the deaf community.
2 code implementations • 3 Apr 2024 • Xiaoshuang Huang, Hongxiang Li, Meng Cao, Long Chen, Chenyu You, Dong An
Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics.
1 code implementation • 18 Jan 2024 • Qingyun Wang, Zixuan Zhang, Hongxiang Li, Xuan Liu, Jiawei Han, Huimin Zhao, Heng Ji
Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges.
no code implementations • 19 Nov 2023 • Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou
Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • ICCV 2023 • Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou
Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations.
no code implementations • ICCV 2023 • Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou
Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists.
no code implementations • 15 Jan 2023 • Hongxiang Li, Meng Cao, Xuxin Cheng, Zhihong Zhu, Yaowei Li, Yuexian Zou
Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.
no code implementations • 28 Feb 2015 • Weiyao Lin, Ming-Ting Sun, Hongxiang Li, Zhenzhong Chen, Wei Li, Bing Zhou
We demonstrate that this low-computation-complexity method can efficiently catch the characteristics of the frame.
no code implementations • 21 Feb 2015 • Weiyao Lin, Yuanzhe Chen, Jianxin Wu, Hanli Wang, Bin Sheng, Hongxiang Li
Based on this network, we further model people in the scene as packages while human activities can be modeled as the process of package transmission in the network.