Search Results for author: Xuming He

Found 86 papers, 41 papers with code

Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images

4 code implementations21 Jul 2020 Shuailin Li, Chuyu Zhang, Xuming He

Semi-supervised learning has attracted much attention in medical image segmentation due to challenges in acquiring pixel-wise image annotations, which is a crucial step for building high-performance deep learning methods.

3D Semantic Segmentation Image Segmentation +3

GNeRF: GAN-based Neural Radiance Field without Posed Camera

1 code implementation ICCV 2021 Quan Meng, Anpei Chen, Haimin Luo, Minye Wu, Hao Su, Lan Xu, Xuming He, Jingyi Yu

We introduce GNeRF, a framework to marry Generative Adversarial Networks (GAN) with Neural Radiance Field (NeRF) reconstruction for the complex scenarios with unknown and even randomly initialized camera poses.

Novel View Synthesis

Pose-aware Multi-level Feature Network for Human Object Interaction Detection

1 code implementation ICCV 2019 Bo Wan, Desen Zhou, Yongfei Liu, Rongjie Li, Xuming He

Reasoning human object interactions is a core problem in human-centric scene understanding and detecting such relations poses a unique challenge to vision systems due to large variations in human-object configurations, multiple co-occurring relation instances and subtle visual difference between relation categories.

Human-Object Interaction Detection Object +2

KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation

1 code implementation Findings (NAACL) 2022 Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan

Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.

Knowledge Distillation Object +1

LatentGNN: Learning Efficient Non-local Relations for Visual Recognition

1 code implementation28 May 2019 Songyang Zhang, Shipeng Yan, Xuming He

A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation.

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

1 code implementation28 Sep 2022 Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzheng Ma, Xupeng Miao, Xuming He, Bin Cui

Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification.

Training-free 3D Point Cloud Classification Transfer Learning +1

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

1 code implementation CVPR 2018 Alexander Mathews, Lexing Xie, Xuming He

We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images.

Descriptive Image Captioning +1

SGTR: End-to-end Scene Graph Generation with Transformer

1 code implementation CVPR 2022 Rongjie Li, Songyang Zhang, Xuming He

Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property.

graph construction Graph Generation +1

SGTR+: End-to-end Scene Graph Generation with Transformer

1 code implementation23 Jan 2024 Rongjie Li, Songyang Zhang, Xuming He

Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on our entity-aware structure, enabling us to generate the scene graph in an end-to-end manner.

graph construction Graph Generation +1

Learning Cross-modal Context Graph for Visual Grounding

2 code implementations20 Nov 2019 Yongfei Liu, Bo Wan, Xiaodan Zhu, Xuming He

To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task.

Graph Matching Visual Grounding

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

1 code implementation CVPR 2023 Shan Ning, Longtian Qiu, Yongfei Liu, Xuming He

In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human-object pair detection.

Human-Object Interaction Detection Knowledge Distillation +2

Superpixel-guided Iterative Learning from Noisy Labels for Medical Image Segmentation

1 code implementation21 Jul 2021 Shuailin Li, Zhitong Gao, Xuming He

Learning segmentation from noisy labels is an important task for medical image analysis due to the difficulty in acquiring highquality annotations.

Image Segmentation Medical Image Segmentation +3

Dynamic Grained Encoder for Vision Transformers

1 code implementation NeurIPS 2021 Lin Song, Songyang Zhang, Songtao Liu, Zeming Li, Xuming He, Hongbin Sun, Jian Sun, Nanning Zheng

Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.

Image Classification Language Modelling +2

Relation-aware Instance Refinement for Weakly Supervised Visual Grounding

1 code implementation CVPR 2021 Yongfei Liu, Bo Wan, Lin Ma, Xuming He

Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding.

Object Relation +3

Human-centric Scene Understanding for 3D Large-scale Scenarios

1 code implementation ICCV 2023 Yiteng Xu, Peishan Cong, Yichen Yao, Runnan Chen, Yuenan Hou, Xinge Zhu, Xuming He, Jingyi Yu, Yuexin Ma

Human-centric scene understanding is significant for real-world applications, but it is extremely challenging due to the existence of diverse human poses and actions, complex human-environment interactions, severe occlusions in crowds, etc.

Action Recognition Scene Understanding +1

Dynamic Context Correspondence Network for Semantic Alignment

1 code implementation ICCV 2019 Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, Xuming He

We instantiate our strategy by designing an end-to-end learnable deep network, named as Dynamic Context Correspondence Network (DCCNet).

Semantic correspondence Weakly-supervised Learning

Modeling Multimodal Aleatoric Uncertainty in Segmentation with Mixture of Stochastic Experts

1 code implementation14 Dec 2022 Zhitong Gao, Yucong Chen, Chuyu Zhang, Xuming He

In this work, we focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images.

Segmentation

Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition

1 code implementation10 Aug 2021 Tailin Chen, Desen Zhou, Jian Wang, Shidong Wang, Yu Guan, Xuming He, Errui Ding

The task of skeleton-based action recognition remains a core challenge in human-centred scene understanding due to the multiple granularities and large variation in human motion.

Action Classification Action Recognition +2

Class-relation Knowledge Distillation for Novel Class Discovery

2 code implementations ICCV 2023 Peiyan Gu, Chuyu Zhang, Ruijie Xu, Xuming He

In addition, to enable a flexible knowledge distillation scheme for each data point in novel classes, we develop a learnable weighting function for the regularization, which adaptively promotes knowledge transfer based on the semantic similarity between the novel and known classes.

Knowledge Distillation Novel Class Discovery +4

Learning Semantic Correspondence with Sparse Annotations

1 code implementation15 Aug 2022 Shuaiyi Huang, Luyu Yang, Bo He, Songyang Zhang, Xuming He, Abhinav Shrivastava

In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations.

Denoising Semantic correspondence

Learning Implicit Temporal Alignment for Few-shot Video Classification

1 code implementation11 May 2021 Songyang Zhang, Jiale Zhou, Xuming He

Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications.

Action Recognition In Videos Classification +2

Weakly Supervised Nuclei Segmentation via Instance Learning

1 code implementation3 Feb 2022 Weizhen Liu, Qian He, Xuming He

Weakly supervised nuclei segmentation is a critical problem for pathological image analysis and greatly benefits the community due to the significant reduction of labeling cost.

Instance Segmentation Representation Learning +2

Novel Class Discovery for Long-tailed Recognition

1 code implementation6 Aug 2023 Chuyu Zhang, Ruijie Xu, Xuming He

In this paper, we consider a more realistic setting for novel class discovery where the distributions of novel and known classes are long-tailed.

Novel Class Discovery

P$^2$OT: Progressive Partial Optimal Transport for Deep Imbalanced Clustering

1 code implementation17 Jan 2024 Chuyu Zhang, Hui Ren, Xuming He

Deep clustering, which learns representation and semantic clustering without labels information, poses a great challenge for deep learning-based approaches.

Clustering Deep Clustering +1

Weakly Supervised Volumetric Segmentation via Self-taught Shape Denoising Model

1 code implementation27 Apr 2021 Qian He, Shuailin Li, Xuming He

Moreover, we introduce a weak annotation scheme with a hybrid label design for volumetric images, which improves model learning without increasing the overall annotation cost.

Denoising Organ Segmentation +2

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

1 code implementation4 Jan 2024 Longtian Qiu, Shan Ning, Xuming He

Firstly, we observe that the CLIP's visual feature of image subregions can achieve closer proximity to the paired caption due to the inherent information loss in text descriptions.

Descriptive Image Captioning +1

An EM Framework for Online Incremental Learning of Semantic Segmentation

1 code implementation8 Aug 2021 Shipeng Yan, Jiale Zhou, Jiangwei Xie, Songyang Zhang, Xuming He

Incremental learning of semantic segmentation has emerged as a promising strategy for visual scene interpretation in the open- world setting.

Incremental Learning Missing Labels +2

Single Image 3D Object Estimation with Primitive Graph Networks

1 code implementation9 Sep 2021 Qian He, Desen Zhou, Bo Wan, Xuming He

To address those challenges, we adopt a primitive-based representation for 3D object, and propose a two-stage graph network for primitive-based 3D object estimation, which consists of a sequential proposal module and a graph reasoning module.

Object Scene Understanding

SP$^2$OT: Semantic-Regularized Progressive Partial Optimal Transport for Imbalanced Clustering

1 code implementation4 Apr 2024 Chuyu Zhang, Hui Ren, Xuming He

To be more precise, we employ the strategy of majorization to reformulate the SP$^2$OT problem into a Progressive Partial Optimal Transport problem, which can be transformed into an unbalanced optimal transport problem with augmented constraints and can be solved efficiently by a fast matrix scaling algorithm.

Clustering Deep Clustering +1

Predicting Salient Face in Multiple-Face Videos

1 code implementation CVPR 2017 Yufan Liu, Songyang Zhang, Mai Xu, Xuming He

On the other hand, we find that the attention of different subjects consistently focuses on a single face in each frame of videos involving multiple faces.

Saliency Prediction

MILD: Modeling the Instance Learning Dynamics for Learning with Noisy Labels

1 code implementation20 Jun 2023 Chuanyang Hu, Shipeng Yan, Zhitong Gao, Xuming He

Despite deep learning has achieved great success, it often relies on a large amount of training data with accurate labels, which are expensive and time-consuming to collect.

Learning with noisy labels Memorization

Gradient-Map-Guided Adaptive Domain Generalization for Cross Modality MRI Segmentation

1 code implementation16 Nov 2023 Bingnan Li, Zhitong Gao, Xuming He

Cross-modal MRI segmentation is of great value for computer-aided medical diagnosis, enabling flexible data acquisition and model generalization.

Domain Generalization Medical Diagnosis +3

Simplifying Sentences with Sequence to Sequence Models

no code implementations15 May 2018 Alexander Mathews, Lexing Xie, Xuming He

We simplify sentences with an attentive neural network sequence to sequence model, dubbed S4.

Style Transfer Text Generation +1

Geometry-aware Deep Network for Single-Image Novel View Synthesis

no code implementations CVPR 2018 Miaomiao Liu, Xuming He, Mathieu Salzmann

By contrast, in this paper, we propose to exploit the 3D geometry of the scene to synthesize a novel view.

Novel View Synthesis

Learning deep structured network for weakly supervised change detection

no code implementations7 Jun 2016 Salman H. Khan, Xuming He, Fatih Porikli, Mohammed Bennamoun, Ferdous Sohel, Roberto Togneri

We apply a constrained mean-field algorithm to estimate the pixel-level labels, and use the estimated labels to update the parameters of the CNN in an iterative EM framework.

Change Detection

Boundary-aware Instance Segmentation

no code implementations CVPR 2017 Zeeshan Hayder, Xuming He, Mathieu Salzmann

In this context, existing methods typically propose candidate objects, usually as bounding boxes, and directly predict a binary mask within each such proposal.

Instance Segmentation Object +3

Learning Dynamic Hierarchical Models for Anytime Scene Labeling

no code implementations11 Aug 2016 Buyu Liu, Xuming He

With increasing demand for efficient image and video analysis, test-time cost of scene parsing becomes critical for many large-scale or time-sensitive vision applications.

Model Selection Representation Learning +2

Semantic-Aware Depth Super-Resolution in Outdoor Scenes

no code implementations31 May 2016 Miaomiao Liu, Mathieu Salzmann, Xuming He

Despite much progress, state-of-the-art techniques suffer from two drawbacks: (i) they rely on the assumption that intensity edges coincide with depth discontinuities, which, unfortunately, is only true in controlled environments; and (ii) they typically exploit the availability of high-resolution training depth maps, which can often not be acquired in practice due to the sensors' limitations.

Super-Resolution

SentiCap: Generating Image Descriptions with Sentiments

no code implementations6 Oct 2015 Alexander Mathews, Lexing Xie, Xuming He

We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments.

Decision Making Descriptive +2

A unified model of short-range and long-range motion perception

no code implementations NeurIPS 2010 Shuang Wu, Xuming He, Hongjing Lu, Alan L. Yuille

The human vision system is able to effortlessly perceive both short-range and long-range motion patterns in complex dynamic scenes.

Learning Hybrid Models for Image Annotation with Partially Labeled Data

no code implementations NeurIPS 2008 Xuming He, Richard S. Zemel

Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain.

Winding Number for Region-Boundary Consistent Salient Contour Extraction

no code implementations CVPR 2013 Yansheng Ming, Hongdong Li, Xuming He

The main focus is given to how to maintain the consistency (compatibility) between the region cues and the boundary cues.

Boundary Detection Segmentation

An Exemplar-based CRF for Multi-instance Object Segmentation

no code implementations CVPR 2014 Xuming He, Stephen Gould

We address the problem of joint detection and segmentation of multiple object instances in an image, a key step towards scene understanding.

Instance Segmentation Object +3

Indoor Scene Structure Analysis for Single Image Depth Estimation

no code implementations CVPR 2015 Wei Zhuo, Mathieu Salzmann, Xuming He, Miaomiao Liu

We tackle the problem of single image depth estimation, which, without additional knowledge, suffers from many ambiguities.

Depth Estimation

Multiclass Semantic Video Segmentation With Object-Level Active Inference

no code implementations CVPR 2015 Buyu Liu, Xuming He

To scale up our method, we adopt an active inference strategy to improve the efficiency, which adaptively selects object subgraphs in the object-augmented dense CRF.

Object Segmentation +3

Separating Objects and Clutter in Indoor Scenes

no code implementations CVPR 2015 Salman H. Khan, Xuming He, Mohammed Bennamoun, Ferdous Sohel, Roberto Togneri

Objects' spatial layout estimation and clutter identification are two important tasks to understand indoor scenes.

Learning to Co-Generate Object Proposals With a Deep Structured Network

no code implementations CVPR 2016 Zeeshan Hayder, Xuming He, Mathieu Salzmann

In particular, we introduce a deep structured network that jointly predicts the objectness scores and the bounding box locations of multiple object candidates.

Object object-detection +2

Indoor Scene Parsing With Instance Segmentation, Semantic Labeling and Support Relationship Inference

no code implementations CVPR 2017 Wei Zhuo, Mathieu Salzmann, Xuming He, Miaomiao Liu

In particular, while some of them aim at segmenting the image into regions, such as object or surface instances, others aim at inferring the semantic labels of given regions, or their support relationships.

Instance Segmentation Scene Parsing +1

Structural Kernel Learning for Large Scale Multiclass Object Co-Detection

no code implementations ICCV 2015 Zeeshan Hayder, Xuming He, Mathieu Salzmann

To exploit the correlations between objects, we build a fully-connected CRF on the candidates, which explicitly incorporates both geometric layout relations across object classes and similarity relations across multiple images.

Object object-detection +1

Deep Free-Form Deformation Network for Object-Mask Registration

no code implementations ICCV 2017 Haoyang Zhang, Xuming He

In this work, we take a transformation based approach that predicts a 2D non-rigid spatial transform and warps the shape mask onto the target object.

Object Semantic Segmentation

Fixed-price Diffusion Mechanism Design

no code implementations14 May 2019 Tianyi Zhang, Dengji Zhao, Wen Zhang, Xuming He

We consider a fixed-price mechanism design setting where a seller sells one item via a social network, but the seller can only directly communicate with her neighbours initially.

Learning a Layout Transfer Network for Context Aware Object Detection

no code implementations9 Dec 2019 Tao Wang, Xuming He, Yuanzheng Cai, Guobao Xiao

We present a context aware object detection method based on a retrieve-and-transform scene layout model.

Autonomous Driving Object +2

Learning Context-aware Task Reasoning for Efficient Meta-reinforcement Learning

no code implementations3 Mar 2020 Haozhe Wang, Jiale Zhou, Xuming He

Despite recent success of deep network-based Reinforcement Learning (RL), it remains elusive to achieve human-level efficiency in learning novel tasks.

Meta-Learning Meta Reinforcement Learning +2

Towards Purely Unsupervised Disentanglement of Appearance and Shape for Person Images Generation

no code implementations26 Jul 2020 Hongtao Yang, Tong Zhang, Wenbing Huang, Xuming He, Fatih Porikli

To be clear, in this paper, we refer unsupervised learning as learning without task-specific human annotations, pairs or any form of weak supervision.)

Disentanglement

LGNN: A Context-aware Line Segment Detector

no code implementations13 Aug 2020 Quan Meng, Jiakai Zhang, Qiang Hu, Xuming He, Jingyi Yu

We present a novel real-time line segment detection scheme called Line Graph Neural Network (LGNN).

Line Segment Detection

Confidence-aware Adversarial Learning for Self-supervised Semantic Matching

no code implementations25 Aug 2020 Shuaiyi Huang, Qiuyue Wang, Xuming He

We are the first that exploit confidence during refinement to improve semantic matching accuracy and develop an end-to-end self-supervised adversarial learning procedure for the entire matching network.

Self-Supervised Learning Semantic correspondence

Smoothed Quantile Regression with Large-Scale Inference

1 code implementation9 Dec 2020 Xuming He, Xiaoou Pan, Kean Ming Tan, Wen-Xin Zhou

Our numerical studies confirm the conquer estimator as a practical and reliable approach to large-scale inference for quantile regression.

Statistics Theory Methodology Statistics Theory

Budget-aware Few-shot Learning via Graph Convolutional Network

no code implementations7 Jan 2022 Shipeng Yan, Songyang Zhang, Xuming He

In this work, we introduce a new budget-aware few-shot learning problem that not only aims to learn novel object categories, but also needs to select informative examples to annotate in order to achieve data efficiency.

Few-Shot Learning Informativeness

Cascaded Sparse Feature Propagation Network for Interactive Segmentation

1 code implementation10 Mar 2022 Chuyu Zhang, Chuanyang Hu, Hui Ren, Yongfei Liu, Xuming He

We aim to tackle the problem of point-based interactive segmentation, in which the key challenge is to propagate the user-provided annotations to unlabeled regions efficiently.

Foreground Segmentation Interactive Segmentation +2

General Incremental Learning with Domain-aware Categorical Representations

no code implementations CVPR 2022 Jiangwei Xie, Shipeng Yan, Xuming He

Continual learning is an important problem for achieving human-level intelligence in real-world applications as an agent must continuously accumulate knowledge in response to streaming data/tasks.

Class Incremental Learning Incremental Learning

Automatic spinal curvature measurement on ultrasound spine images using Faster R-CNN

no code implementations17 Apr 2022 Zhichao Liu, Liyue Qian, Wenke Jing, Desen Zhou, Xuming He, Edmond Lou, Rui Zheng

The framework consisted of two closely linked modules: 1) the lamina detector for identifying and locating each lamina pairs on ultrasound coronal images, and 2) the spinal curvature estimator for calculating the scoliotic angles based on the chain of detected lamina.

Mutual Information-guided Knowledge Transfer for Novel Class Discovery

no code implementations24 Jun 2022 Chuyu Zhang, Chuanyang Hu, Ruijie Xu, Zhitong Gao, Qian He, Xuming He

Our insight is to utilize mutual information to measure the relation between seen classes and unseen classes in a restricted label space and maximizing mutual information promotes transferring semantic knowledge.

Novel Class Discovery Relation +1

A Novel Unified Conditional Score-based Generative Framework for Multi-modal Medical Image Completion

no code implementations7 Jul 2022 Xiangxi Meng, Yuning Gu, Yongsheng Pan, Nizhuan Wang, Peng Xue, Mengkang Lu, Xuming He, Yiqiang Zhan, Dinggang Shen

Multi-modal medical image completion has been extensively applied to alleviate the missing modality issue in a wealth of multi-modal diagnostic tasks.

Part-aware Prototypical Graph Network for One-shot Skeleton-based Action Recognition

no code implementations19 Aug 2022 Tailin Chen, Desen Zhou, Jian Wang, Shidong Wang, Qian He, Chuanyang Hu, Errui Ding, Yu Guan, Xuming He

In this paper, we study the problem of one-shot skeleton-based action recognition, which poses unique challenges in learning transferable representation from base classes to novel classes, particularly for fine-grained actions.

Action Recognition Meta-Learning +1

Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning

no code implementations2 Mar 2023 Bo Wan, Yongfei Liu, Desen Zhou, Tinne Tuytelaars, Xuming He

Human object interaction (HOI) detection plays a crucial role in human-centric scene understanding and serves as a fundamental building-block for many vision tasks.

Human-Object Interaction Detection Knowledge Distillation +3

A Physics-Informed Data-Driven Fault Location Method for Transmission Lines Using Single-Ended Measurements with Field Data Validation

no code implementations19 Jul 2023 Yiqi Xing, Yu Liu, Dayou Lu, Xinchen Zou, Xuming He

This procedure merges the gap between simulation and practical power systems, and at the same time considers the uncertainty of system and fault parameters in practice.

Grounded Image Text Matching with Mismatched Relation Reasoning

no code implementations ICCV 2023 Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He

This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models.

Image-text matching Relation +2

GenEM: Physics-Informed Generative Cryo-Electron Microscopy

no code implementations4 Dec 2023 Jiakai Zhang, Qihe Chen, Yan Zeng, Wenyuan Gao, Xuming He, Zhijie Liu, Jingyi Yu

To address this, we introduce physics-informed generative cryo-electron microscopy (GenEM), which for the first time integrates physical-based cryo-EM simulation with a generative unpaired noise translation to generate physically correct synthetic cryo-EM datasets with realistic noises.

Contrastive Learning Pose Estimation +1

RealDex: Towards Human-like Grasping for Robotic Dexterous Hand

no code implementations21 Feb 2024 Yumeng Liu, Yaxun Yang, Youzhuo Wang, Xiaofei Wu, Jiamin Wang, Yichen Yao, Sören Schwertfeger, Sibei Yang, Wenping Wang, Jingyi Yu, Xuming He, Yuexin Ma

In this paper, we introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns, enriched by multi-view and multimodal visual data.

DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation

no code implementations21 Mar 2024 Zeeshan Hayder, Xuming He

Scene graph generation aims to capture detailed spatial and semantic relationships between objects in an image, which is challenging due to incomplete labelling, long-tailed relationship categories, and relational semantic overlap.

Graph Generation Graph Matching +3

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning

no code implementations1 Apr 2024 Rongjie Li, Yu Wu, Xuming He

Generative vision-language models (VLMs) have shown impressive performance in zero-shot vision-language tasks like image captioning and visual question answering.

Image Captioning Instruction Following +5

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

no code implementations1 Apr 2024 Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He

Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks.

Graph Generation Relation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.