Search Results for author: Hongxiang Li

Found 16 papers, 5 papers with code

VideoGen-Eval: Agent-based System for Video Generation Evaluation

1 code implementation30 Mar 2025 Yuhang Yang, Ke Fan, Shangkun Sun, Hongxiang Li, Ailing Zeng, Feilin Han, Wei Zhai, Wei Liu, Yang Cao, Zheng-Jun Zha

The rapid advancement of video generation has rendered existing evaluation systems inadequate for assessing state-of-the-art models, primarily due to simple prompts that cannot showcase the model's capabilities, fixed evaluation operators struggling with Out-of-Distribution (OOD) cases, and misalignment between computed metrics and human preferences.

Diversity Video Generation

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

no code implementations17 Mar 2025 Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou

Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools.

Computational Efficiency Data Augmentation +2

Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine

1 code implementation12 Dec 2024 Xiaoshuang Huang, Lingdong Shen, Jia Liu, Fangxin Shang, Hongxiang Li, Haifeng Huang, Yehui Yang

In this paper, we introduce a novel end-to-end multimodal large language model for the biomedical domain, named MedPLIB, which possesses pixel-level understanding.

Language Modeling Language Modelling +4

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

1 code implementation12 Dec 2024 Hongxiang Li, Yaowei Li, Yuhang Yang, Junjie Cao, Zhihong Zhu, Xuxin Cheng, Long Chen

Specifically, we generate a dense motion field from a sparse motion field and the reference image, which provides region-level dense guidance while maintaining the generalization of the sparse pose control.

Image Animation

Deep Learning Autoencoders for Reducing PAPR in Coherent Optical Systems

no code implementations26 Aug 2024 Omar Alnaseri, Ibtesam R. K. Al-Saedi, Yassine Himeur, Hongxiang Li

In particular, the AE model achieves the best BER performance of \(2 \times 10^{-6}\) at 44 dB OSNR, surpassing traditional methods.

Decoder Deep Learning

Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation

2 code implementations3 Apr 2024 Xiaoshuang Huang, Hongxiang Li, Meng Cao, Long Chen, Chenyu You, Dong An

Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics.

Image Segmentation Medical Image Segmentation +2

ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

no code implementations19 Nov 2023 Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory

no code implementations ICCV 2023 Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou

Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations.

Contrastive Learning Video Grounding

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

no code implementations ICCV 2023 Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists.

Sentence Triplet

Exploiting Auxiliary Caption for Video Grounding

no code implementations15 Jan 2023 Hongxiang Li, Meng Cao, Xuxin Cheng, Zhihong Zhu, Yaowei Li, Yuexian Zou

Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.

Contrastive Learning Dense Video Captioning +2

Macroblock Classification Method for Video Applications Involving Motions

no code implementations28 Feb 2015 Weiyao Lin, Ming-Ting Sun, Hongxiang Li, Zhenzhong Chen, Wei Li, Bing Zhou

We demonstrate that this low-computation-complexity method can efficiently catch the characteristics of the frame.

Change Detection Classification +2

A new network-based algorithm for human activity recognition in video

no code implementations21 Feb 2015 Weiyao Lin, Yuanzhe Chen, Jianxin Wu, Hanli Wang, Bin Sheng, Hongxiang Li

Based on this network, we further model people in the scene as packages while human activities can be modeled as the process of package transmission in the network.

Activity Detection Activity Recognition In Videos +2

Cannot find the paper you are looking for? You can Submit a new open access paper.