Search Results for author: Diandian Guo

Found 8 papers, 4 papers with code

Multi-View Incongruity Learning for Multimodal Sarcasm Detection

no code implementations1 Dec 2024 Diandian Guo, Cong Cao, Fangfang Yuan, Yanbing Liu, Guangjie Zeng, Xiaoyan Yu, Hao Peng, Philip S. Yu

Experimental results demonstrate the superiority of MICL on benchmark datasets, along with the analyses showcasing MICL's advancement in mitigating the effect of spurious correlation.

Contrastive Learning Data Augmentation +2

Can Multimodal Large Language Model Think Analogically?

no code implementations2 Nov 2024 Diandian Guo, Cong Cao, Fangfang Yuan, Dakui Wang, Wei Ma, Yanbing Liu, Jianhui Fu

The experiments show that our approach outperforms existing methods on popular datasets, providing preliminary evidence for the analogical reasoning capability of MLLM.

Language Modeling Language Modelling +2

Surgical Workflow Recognition and Blocking Effectiveness Detection in Laparoscopic Liver Resections with Pringle Maneuver

1 code implementation20 Aug 2024 Diandian Guo, Weixin Si, Zhixi Li, Jialun Pei, Pheng-Ann Heng

Pringle maneuver (PM) in laparoscopic liver resection aims to reduce blood loss and provide a clear surgical view by intermittently blocking blood inflow of the liver, whereas prolonged PM may cause ischemic injury.

Blocking

Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms

no code implementations14 Apr 2024 Diandian Guo, Manxi Lin, Jialun Pei, He Tang, Yueming Jin, Pheng-Ann Heng

A comprehensive understanding of surgical scenes allows for monitoring of the surgical process, reducing the occurrence of accidents and enhancing efficiency for medical professionals.

Graph Generation Scene Graph Generation

S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR

1 code implementation22 Feb 2024 Jialun Pei, Diandian Guo, Jingyang Zhang, Manxi Lin, Yueming Jin, Pheng-Ann Heng

In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S^2Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner.

Graph Generation object-detection +3

Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

1 code implementation CVPR 2024 Diandian Guo, Deng-Ping Fan, Tongyu Lu, Christos Sakaridis, Luc van Gool

The estimation of implicit cross-frame correspondences and the high computational cost have long been major challenges in video semantic segmentation (VSS) for driving scenes.

Motion Estimation Segmentation +2

A Semi-Paired Approach For Label-to-Image Translation

no code implementations23 Jun 2023 George Eskandar, Shuai Zhang, Mohamed Abdelsamad, Mark Youssef, Diandian Guo, Bin Yang

Data efficiency, or the ability to generalize from a few labeled data, remains a major challenge in deep learning.

Image-to-Image Translation Translation

Towards Pragmatic Semantic Image Synthesis for Urban Scenes

1 code implementation16 May 2023 George Eskandar, Diandian Guo, Karim Guirguis, Bin Yang

Second, in contrast to previous works which employ one discriminator that overfits the target domain semantic distribution, we employ a discriminator for the whole image and multiscale discriminators on the image patches.

Autonomous Driving Image Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.