Search Results for author: Xinpeng Ding

Found 16 papers, 10 papers with code

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models

1 code implementation • 2 Jan 2024 • Xinpeng Ding, Jinahua Han, Hang Xu, Xiaodan Liang, Wei zhang, Xiaomeng Li

BEV-InMLLM integrates multi-view, spatial awareness, and temporal semantics to enhance MLLMs' capabilities on NuInstruct tasks.

Autonomous Driving

Paper
Code

EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model

no code implementations • 5 Dec 2023 • Guozhang Li, Xinpeng Ding, De Cheng, Jie Li, Nannan Wang, Xinbo Gao

To further clarify the noise of expanded boundaries, we combine mutual learning with a tailored proposal-level contrastive objective to use a learnable approach to harmonize a balance between incomplete yet clean (initial) and comprehensive yet noisy (expanded) boundaries for more precise ones.

Boundary Detection Language Modelling +2

Paper
Add Code

GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation

1 code implementation • ICCV 2023 • Jiewen Yang, Xinpeng Ding, Ziyang Zheng, Xiaowei Xu, Xiaomeng Li

This paper studies the unsupervised domain adaption (UDA) for echocardiogram video segmentation, where the goal is to generalize the model trained on the source domain to other unlabelled target domains.

Graph Matching Segmentation +3

Paper
Code

GL-Fusion: Global-Local Fusion Network for Multi-view Echocardiogram Video Segmentation

1 code implementation • 20 Sep 2023 • Ziyang Zheng, Jiewen Yang, Xinpeng Ding, Xiaowei Xu, Xiaomeng Li

Additionally, a Multi-view Local-based Fusion Module (MLFM) is designed to extract correlations of cardiac structures from different views.

Video Segmentation Video Semantic Segmentation

Paper
Code

HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving

no code implementations • 11 Sep 2023 • Xinpeng Ding, Jianhua Han, Hang Xu, Wei zhang, Xiaomeng Li

For the first time, we leverage singular multimodal large language models (MLLMs) to consolidate multiple autonomous driving tasks from videos, i. e., the Risk Object Localization and Intention and Suggestion Prediction (ROLISP) task.

Autonomous Driving Object Localization

Paper
Add Code

Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

no code implementations • 24 Aug 2023 • Siming Fu, Xiaoxuan He, Xinpeng Ding, Yuchen Cao, Hualiang Wang

Category prototype-guided mechanism for image-text matching makes the features of different classes converge to these distinct and uniformly distributed category prototypes, which maintain a uniform distribution in the feature space, and improve class boundaries.

Attribute Image-text matching +1

Paper
Add Code

Context-Aware Pseudo-Label Refinement for Source-Free Domain Adaptive Fundus Image Segmentation

1 code implementation • 15 Aug 2023 • Zheang Huai, Xinpeng Ding, Yi Li, Xiaomeng Li

To this end, we propose a context-aware pseudo-label refinement method for SF-UDA.

Denoising Image Segmentation +3

Paper
Code

Boosting Weakly-Supervised Temporal Action Localization with Text Information

1 code implementation • CVPR 2023 • Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Xiaoyu Wang, Xinbo Gao

For the discriminative objective, we propose a Text-Segment Mining (TSM) mechanism, which constructs a text description based on the action class label, and regards the text as the query to mine all class-related segments.

Sentence Weakly-supervised Temporal Action Localization +1

Paper
Code

Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint

1 code implementation • 25 Apr 2023 • Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Jie Li, Xinbo Gao

The proposed Bi-SCC firstly adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video; Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

Paper
Code

Cyclical Self-Supervision for Semi-Supervised Ejection Fraction Prediction from Echocardiogram Videos

1 code implementation • 20 Oct 2022 • Weihang Dai, Xiaomeng Li, Xinpeng Ding, Kwang-Ting Cheng

We also introduce teacher-student distillation to distill the information from LV segmentation masks into an end-to-end LVEF regression model that only requires video inputs.

LV Segmentation regression +2

Paper
Code

Learning Shadow Correspondence for Video Shadow Detection

no code implementations • 30 Jul 2022 • Xinpeng Ding, Jingweng Yang, Xiaowei Hu, Xiaomeng Li

We further design a new evaluation metric to evaluate the temporal stability of the video shadow detection results.

Shadow Detection

Paper
Add Code

Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions

1 code implementation • 19 May 2022 • Xinpeng Ding, Ziwei Liu, Xiaomeng Li

Our key insight is to distill knowledge from publicly available models trained on large generic datasets4 to facilitate the self-supervised learning of surgical videos.

Contrastive Learning Self-Supervised Learning +2

Paper
Code

Less is More: Surgical Phase Recognition from Timestamp Supervision

1 code implementation • 16 Feb 2022 • Xinpeng Ding, Xinjian Yan, Zixun Wang, Wei Zhao, Jian Zhuang, Xiaowei Xu, Xiaomeng Li

Our study uncovers unique insights of surgical phase recognition with timestamp supervisions: 1) timestamp annotation can reduce 74% annotation time compared with the full annotation, and surgeons tend to annotate those timestamps near the middle of phases; 2) extensive experiments demonstrate that our method can achieve competitive results compared with full supervision methods, while reducing manual annotation cost; 3) less is more in surgical phase recognition, i. e., less but discriminative pseudo labels outperform full but containing ambiguous frames; 4) the proposed UATD can be used as a plug and play method to clean ambiguous labels near boundaries between phases, and improve the performance of the current surgical phase recognition methods.

Surgical phase recognition

Paper
Code

Exploring Segment-level Semantics for Online Phase Recognition from Surgical Videos

1 code implementation • 22 Nov 2021 • Xinpeng Ding, Xiaomeng Li

Automatic surgical phase recognition plays a vital role in robot-assisted surgeries.

Surgical phase recognition

Paper
Code

Support-Set Based Cross-Supervision for Video Grounding

no code implementations • ICCV 2021 • Xinpeng Ding, Nannan Wang, Shiwei Zhang, De Cheng, Xiaomeng Li, Ziyuan Huang, Mingqian Tang, Xinbo Gao

The contrastive objective aims to learn effective representations by contrastive learning, while the caption objective can train a powerful video encoder supervised by texts.

Contrastive Learning Video Grounding

Paper
Add Code

Weakly Supervised Temporal Action Localization with Segment-Level Labels

no code implementations • 3 Jul 2020 • Xinpeng Ding, Nannan Wang, Xinbo Gao, Jie Li, Xiaoyu Wang, Tongliang Liu

Specifically, we devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.