1 code implementation • 4 Apr 2024 • Chuyu Zhang, Hui Ren, Xuming He
To be more precise, we employ the strategy of majorization to reformulate the SP$^2$OT problem into a Progressive Partial Optimal Transport problem, which can be transformed into an unbalanced optimal transport problem with augmented constraints and can be solved efficiently by a fast matrix scaling algorithm.
no code implementations • 1 Apr 2024 • Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He
Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks.
no code implementations • 1 Apr 2024 • Rongjie Li, Yu Wu, Xuming He
Generative vision-language models (VLMs) have shown impressive performance in zero-shot vision-language tasks like image captioning and visual question answering.
no code implementations • 21 Mar 2024 • Zeeshan Hayder, Xuming He
Scene graph generation aims to capture detailed spatial and semantic relationships between objects in an image, which is challenging due to incomplete labelling, long-tailed relationship categories, and relational semantic overlap.
no code implementations • 21 Feb 2024 • Yumeng Liu, Yaxun Yang, Youzhuo Wang, Xiaofei Wu, Jiamin Wang, Yichen Yao, Sören Schwertfeger, Sibei Yang, Wenping Wang, Jingyi Yu, Xuming He, Yuexin Ma
In this paper, we introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns, enriched by multi-view and multimodal visual data.
1 code implementation • 23 Jan 2024 • Rongjie Li, Songyang Zhang, Xuming He
Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on our entity-aware structure, enabling us to generate the scene graph in an end-to-end manner.
1 code implementation • 17 Jan 2024 • Chuyu Zhang, Hui Ren, Xuming He
Deep clustering, which learns representation and semantic clustering without labels information, poses a great challenge for deep learning-based approaches.
1 code implementation • 4 Jan 2024 • Longtian Qiu, Shan Ning, Xuming He
Firstly, we observe that the CLIP's visual feature of image subregions can achieve closer proximity to the paired caption due to the inherent information loss in text descriptions.
no code implementations • 4 Dec 2023 • Jiakai Zhang, Qihe Chen, Yan Zeng, Wenyuan Gao, Xuming He, Zhijie Liu, Jingyi Yu
To address this, we introduce physics-informed generative cryo-electron microscopy (GenEM), which for the first time integrates physical-based cryo-EM simulation with a generative unpaired noise translation to generate physically correct synthetic cryo-EM datasets with realistic noises.
1 code implementation • 16 Nov 2023 • Bingnan Li, Zhitong Gao, Xuming He
Cross-modal MRI segmentation is of great value for computer-aided medical diagnosis, enabling flexible data acquisition and model generalization.
1 code implementation • 13 Nov 2023 • Ziyi Lin, Chris Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Chen Lin, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Hongsheng Li, Yu Qiao
We present SPHINX, a versatile multi-modal large language model (MLLM) with a joint mixing of model weights, tuning tasks, and visual embeddings.
Ranked #2 on Visual Question Answering on BenchLMM
no code implementations • 27 Sep 2023 • Xuanlong Yu, Yi Zuo, Zitao Wang, Xiaowen Zhang, Jiaxuan Zhao, Yuting Yang, Licheng Jiao, Rui Peng, Xinyi Wang, Junpei Zhang, Kexin Zhang, Fang Liu, Roberto Alcover-Couso, Juan C. SanMiguel, Marcos Escudero-Viñolo, Hanlin Tian, Kenta Matsui, Tianhao Wang, Fahmy Adan, Zhitong Gao, Xuming He, Quentin Bouniot, Hossein Moghaddam, Shyam Nandan Rai, Fabio Cermelli, Carlo Masone, Andrea Pilzer, Elisa Ricci, Andrei Bursuc, Arno Solin, Martin Trapp, Rui Li, Angela Yao, Wenlong Chen, Ivor Simpson, Neill D. F. Campbell, Gianni Franchi
This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023.
1 code implementation • 6 Aug 2023 • Chuyu Zhang, Ruijie Xu, Xuming He
In this paper, we consider a more realistic setting for novel class discovery where the distributions of novel and known classes are long-tailed.
no code implementations • ICCV 2023 • Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He
This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models.
1 code implementation • ICCV 2023 • Yiteng Xu, Peishan Cong, Yichen Yao, Runnan Chen, Yuenan Hou, Xinge Zhu, Xuming He, Jingyi Yu, Yuexin Ma
Human-centric scene understanding is significant for real-world applications, but it is extremely challenging due to the existence of diverse human poses and actions, complex human-environment interactions, severe occlusions in crowds, etc.
no code implementations • 19 Jul 2023 • Yiqi Xing, Yu Liu, Dayou Lu, Xinchen Zou, Xuming He
This procedure merges the gap between simulation and practical power systems, and at the same time considers the uncertainty of system and fault parameters in practice.
2 code implementations • ICCV 2023 • Peiyan Gu, Chuyu Zhang, Ruijie Xu, Xuming He
In addition, to enable a flexible knowledge distillation scheme for each data point in novel classes, we develop a learnable weighting function for the regularization, which adaptively promotes knowledge transfer based on the semantic similarity between the novel and known classes.
1 code implementation • 20 Jun 2023 • Chuanyang Hu, Shipeng Yan, Zhitong Gao, Xuming He
Despite deep learning has achieved great success, it often relies on a large amount of training data with accurate labels, which are expensive and time-consuming to collect.
1 code implementation • CVPR 2023 • Shan Ning, Longtian Qiu, Yongfei Liu, Xuming He
In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human-object pair detection.
Ranked #8 on Human-Object Interaction Detection on V-COCO
Human-Object Interaction Detection Knowledge Distillation +2
no code implementations • 2 Mar 2023 • Bo Wan, Yongfei Liu, Desen Zhou, Tinne Tuytelaars, Xuming He
Human object interaction (HOI) detection plays a crucial role in human-centric scene understanding and serves as a fundamental building-block for many vision tasks.
Human-Object Interaction Detection Knowledge Distillation +3
1 code implementation • NeurIPS 2021 • Lin Song, Songyang Zhang, Songtao Liu, Zeming Li, Xuming He, Hongbin Sun, Jian Sun, Nanning Zheng
Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.
1 code implementation • 14 Dec 2022 • Zhitong Gao, Yucong Chen, Chuyu Zhang, Xuming He
In this work, we focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images.
no code implementations • 31 Oct 2022 • Shipeng Yan, Lanqing Hong, Hang Xu, Jianhua Han, Tinne Tuytelaars, Zhenguo Li, Xuming He
In this work, we focus on learning a VLP model with sequential chunks of image-text pair data.
1 code implementation • 28 Sep 2022 • Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzheng Ma, Xupeng Miao, Xuming He, Bin Cui
Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification.
Ranked #4 on Training-free 3D Point Cloud Classification on ScanObjectNN (using extra training data)
Training-free 3D Point Cloud Classification Transfer Learning +1
no code implementations • 19 Aug 2022 • Tailin Chen, Desen Zhou, Jian Wang, Shidong Wang, Qian He, Chuanyang Hu, Errui Ding, Yu Guan, Xuming He
In this paper, we study the problem of one-shot skeleton-based action recognition, which poses unique challenges in learning transferable representation from base classes to novel classes, particularly for fine-grained actions.
1 code implementation • 15 Aug 2022 • Shuaiyi Huang, Luyu Yang, Bo He, Songyang Zhang, Xuming He, Abhinav Shrivastava
In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations.
no code implementations • 7 Jul 2022 • Xiangxi Meng, Yuning Gu, Yongsheng Pan, Nizhuan Wang, Peng Xue, Mengkang Lu, Xuming He, Yiqiang Zhan, Dinggang Shen
Multi-modal medical image completion has been extensively applied to alleviate the missing modality issue in a wealth of multi-modal diagnostic tasks.
no code implementations • 24 Jun 2022 • Chuyu Zhang, Chuanyang Hu, Ruijie Xu, Zhitong Gao, Qian He, Xuming He
Our insight is to utilize mutual information to measure the relation between seen classes and unseen classes in a restricted label space and maximizing mutual information promotes transferring semantic knowledge.
1 code implementation • 10 Jun 2022 • Haozhe Wang, Chao Du, Panyan Fang, Shuo Yuan, Xuming He, Liang Wang, Bo Zheng
Real-Time Bidding (RTB) is an important mechanism in modern online advertising systems.
no code implementations • 17 Apr 2022 • Zhichao Liu, Liyue Qian, Wenke Jing, Desen Zhou, Xuming He, Edmond Lou, Rui Zheng
The framework consisted of two closely linked modules: 1) the lamina detector for identifying and locating each lamina pairs on ultrasound coronal images, and 2) the spinal curvature estimator for calculating the scoliotic angles based on the chain of detected lamina.
no code implementations • CVPR 2022 • Jiangwei Xie, Shipeng Yan, Xuming He
Continual learning is an important problem for achieving human-level intelligence in real-world applications as an agent must continuously accumulate knowledge in response to streaming data/tasks.
1 code implementation • 10 Mar 2022 • Chuyu Zhang, Chuanyang Hu, Hui Ren, Yongfei Liu, Xuming He
We aim to tackle the problem of point-based interactive segmentation, in which the key challenge is to propagate the user-provided annotations to unlabeled regions efficiently.
Ranked #4 on Interactive Segmentation on SBD
1 code implementation • 3 Feb 2022 • Weizhen Liu, Qian He, Xuming He
Weakly supervised nuclei segmentation is a critical problem for pathological image analysis and greatly benefits the community due to the significant reduction of labeling cost.
no code implementations • 7 Jan 2022 • Shipeng Yan, Songyang Zhang, Xuming He
In this work, we introduce a new budget-aware few-shot learning problem that not only aims to learn novel object categories, but also needs to select informative examples to annotate in order to achieve data efficiency.
1 code implementation • CVPR 2022 • Rongjie Li, Songyang Zhang, Xuming He
Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property.
no code implementations • 29 Sep 2021 • Rongjie Li, Songyang Zhang, Xuming He
We develop a decoding-and-assembling paradigm for the end-to-end scene graph generation.
1 code implementation • Findings (NAACL) 2022 • Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan
Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.
1 code implementation • 9 Sep 2021 • Qian He, Desen Zhou, Bo Wan, Xuming He
To address those challenges, we adopt a primitive-based representation for 3D object, and propose a two-stage graph network for primitive-based 3D object estimation, which consists of a sequential proposal module and a graph reasoning module.
1 code implementation • 10 Aug 2021 • Tailin Chen, Desen Zhou, Jian Wang, Shidong Wang, Yu Guan, Xuming He, Errui Ding
The task of skeleton-based action recognition remains a core challenge in human-centred scene understanding due to the multiple granularities and large variation in human motion.
1 code implementation • 8 Aug 2021 • Shipeng Yan, Jiale Zhou, Jiangwei Xie, Songyang Zhang, Xuming He
Incremental learning of semantic segmentation has emerged as a promising strategy for visual scene interpretation in the open- world setting.
1 code implementation • 27 Jul 2021 • Songyang Zhang, Lin Song, Songtao Liu, Zheng Ge, Zeming Li, Xuming He, Jian Sun
In this report, we introduce our real-time 2D object detection system for the realistic autonomous driving scenario.
1 code implementation • 21 Jul 2021 • Shuailin Li, Zhitong Gao, Xuming He
Learning segmentation from noisy labels is an important task for medical image analysis due to the difficulty in acquiring highquality annotations.
1 code implementation • 11 May 2021 • Songyang Zhang, Jiale Zhou, Xuming He
Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications.
1 code implementation • 27 Apr 2021 • Qian He, Shuailin Li, Xuming He
Moreover, we introduce a weak annotation scheme with a hybrid label design for volumetric images, which improves model learning without increasing the overall annotation cost.
3 code implementations • CVPR 2021 • Rongjie Li, Songyang Zhang, Bo Wan, Xuming He
Scene graph generation is an important visual understanding task with a broad range of vision applications.
2 code implementations • CVPR 2021 • Shipeng Yan, Jiangwei Xie, Xuming He
We address the problem of class incremental learning, which is a core step towards achieving adaptive vision intelligence.
1 code implementation • CVPR 2021 • Songyang Zhang, Zeming Li, Shipeng Yan, Xuming He, Jian Sun
Motivated by our discovery, we propose a unified distribution alignment strategy for long-tail visual recognition.
Ranked #17 on Long-tail Learning on Places-LT
1 code implementation • ICCV 2021 • Quan Meng, Anpei Chen, Haimin Luo, Minye Wu, Hao Su, Lan Xu, Xuming He, Jingyi Yu
We introduce GNeRF, a framework to marry Generative Adversarial Networks (GAN) with Neural Radiance Field (NeRF) reconstruction for the complex scenarios with unknown and even randomly initialized camera poses.
1 code implementation • CVPR 2021 • Yongfei Liu, Bo Wan, Lin Ma, Xuming He
Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding.
1 code implementation • 9 Dec 2020 • Xuming He, Xiaoou Pan, Kean Ming Tan, Wen-Xin Zhou
Our numerical studies confirm the conquer estimator as a practical and reliable approach to large-scale inference for quantile regression.
Statistics Theory Methodology Statistics Theory
no code implementations • 25 Aug 2020 • Shuaiyi Huang, Qiuyue Wang, Xuming He
We are the first that exploit confidence during refinement to improve semantic matching accuracy and develop an end-to-end self-supervised adversarial learning procedure for the entire matching network.
no code implementations • 13 Aug 2020 • Quan Meng, Jiakai Zhang, Qiang Hu, Xuming He, Jingyi Yu
We present a novel real-time line segment detection scheme called Line Graph Neural Network (LGNN).
no code implementations • 26 Jul 2020 • Hongtao Yang, Tong Zhang, Wenbing Huang, Xuming He, Fatih Porikli
To be clear, in this paper, we refer unsupervised learning as learning without task-specific human annotations, pairs or any form of weak supervision.)
4 code implementations • 21 Jul 2020 • Shuailin Li, Chuyu Zhang, Xuming He
Semi-supervised learning has attracted much attention in medical image segmentation due to challenges in acquiring pixel-wise image annotations, which is a crucial step for building high-performance deep learning methods.
2 code implementations • ECCV 2020 • Yongfei Liu, Xiangyi Zhang, Songyang Zhang, Xuming He
In this paper, we propose a novel few-shot semantic segmentation framework based on the prototype representation.
Ranked #3 on Few-Shot Semantic Segmentation on Pascal5i
no code implementations • 3 Mar 2020 • Haozhe Wang, Jiale Zhou, Xuming He
Despite recent success of deep network-based Reinforcement Learning (RL), it remains elusive to achieve human-level efficiency in learning novel tasks.
no code implementations • 9 Dec 2019 • Tao Wang, Xuming He, Yuanzheng Cai, Guobao Xiao
We present a context aware object detection method based on a retrieve-and-transform scene layout model.
2 code implementations • 20 Nov 2019 • Yongfei Liu, Bo Wan, Xiaodan Zhu, Xuming He
To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task.
1 code implementation • ICCV 2019 • Bo Wan, Desen Zhou, Yongfei Liu, Rongjie Li, Xuming He
Reasoning human object interactions is a core problem in human-centric scene understanding and detecting such relations poses a unique challenge to vision systems due to large variations in human-object configurations, multiple co-occurring relation instances and subtle visual difference between relation categories.
1 code implementation • ICCV 2019 • Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, Xuming He
We instantiate our strategy by designing an end-to-end learnable deep network, named as Dynamic Context Correspondence Network (DCCNet).
1 code implementation • 28 May 2019 • Songyang Zhang, Shipeng Yan, Xuming He
A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation.
no code implementations • 14 May 2019 • Tianyi Zhang, Dengji Zhao, Wen Zhang, Xuming He
We consider a fixed-price mechanism design setting where a seller sells one item via a social network, but the seller can only directly communicate with her neighbours initially.
no code implementations • CVPR 2018 • Hongtao Yang, Xuming He, Fatih Porikli
Learning based temporal action localization methods require vast amounts of training data.
1 code implementation • CVPR 2018 • Alexander Mathews, Lexing Xie, Xuming He
We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images.
no code implementations • 15 May 2018 • Alexander Mathews, Lexing Xie, Xuming He
We simplify sentences with an attentive neural network sequence to sequence model, dubbed S4.
no code implementations • CVPR 2018 • Miaomiao Liu, Xuming He, Mathieu Salzmann
By contrast, in this paper, we propose to exploit the 3D geometry of the scene to synthesize a novel view.
no code implementations • ICCV 2017 • Haoyang Zhang, Xuming He
In this work, we take a transformation based approach that predicts a 2D non-rigid spatial transform and warps the shape mask onto the target object.
no code implementations • CVPR 2017 • Wei Zhuo, Mathieu Salzmann, Xuming He, Miaomiao Liu
In particular, while some of them aim at segmenting the image into regions, such as object or surface instances, others aim at inferring the semantic labels of given regions, or their support relationships.
1 code implementation • CVPR 2017 • Yufan Liu, Songyang Zhang, Mai Xu, Xuming He
On the other hand, we find that the attention of different subjects consistently focuses on a single face in each frame of videos involving multiple faces.
no code implementations • CVPR 2017 • Zeeshan Hayder, Xuming He, Mathieu Salzmann
In this context, existing methods typically propose candidate objects, usually as bounding boxes, and directly predict a binary mask within each such proposal.
no code implementations • 11 Aug 2016 • Buyu Liu, Xuming He
With increasing demand for efficient image and video analysis, test-time cost of scene parsing becomes critical for many large-scale or time-sensitive vision applications.
no code implementations • 7 Jun 2016 • Salman H. Khan, Xuming He, Fatih Porikli, Mohammed Bennamoun, Ferdous Sohel, Roberto Togneri
We apply a constrained mean-field algorithm to estimate the pixel-level labels, and use the estimated labels to update the parameters of the CNN in an iterative EM framework.
no code implementations • CVPR 2016 • Zeeshan Hayder, Xuming He, Mathieu Salzmann
In particular, we introduce a deep structured network that jointly predicts the objectness scores and the bounding box locations of multiple object candidates.
no code implementations • 31 May 2016 • Miaomiao Liu, Mathieu Salzmann, Xuming He
Despite much progress, state-of-the-art techniques suffer from two drawbacks: (i) they rely on the assumption that intensity edges coincide with depth discontinuities, which, unfortunately, is only true in controlled environments; and (ii) they typically exploit the availability of high-resolution training depth maps, which can often not be acquired in practice due to the sensors' limitations.
no code implementations • ICCV 2015 • Zeeshan Hayder, Xuming He, Mathieu Salzmann
To exploit the correlations between objects, we build a fully-connected CRF on the candidates, which explicitly incorporates both geometric layout relations across object classes and similarity relations across multiple images.
no code implementations • 19 Nov 2015 • Miaomiao Liu, Mathieu Salzmann, Xuming He
To this end, we first study the problem of depth estimation from a single image.
no code implementations • 6 Oct 2015 • Alexander Mathews, Lexing Xie, Xuming He
We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments.
no code implementations • CVPR 2015 • Buyu Liu, Xuming He
To scale up our method, we adopt an active inference strategy to improve the efficiency, which adaptively selects object subgraphs in the object-augmented dense CRF.
no code implementations • CVPR 2015 • Salman H. Khan, Xuming He, Mohammed Bennamoun, Ferdous Sohel, Roberto Togneri
Objects' spatial layout estimation and clutter identification are two important tasks to understand indoor scenes.
no code implementations • CVPR 2015 • Wei Zhuo, Mathieu Salzmann, Xuming He, Miaomiao Liu
We tackle the problem of single image depth estimation, which, without additional knowledge, suffers from many ambiguities.
no code implementations • CVPR 2014 • Xuming He, Stephen Gould
We address the problem of joint detection and segmentation of multiple object instances in an image, a key step towards scene understanding.
no code implementations • CVPR 2014 • Miaomiao Liu, Mathieu Salzmann, Xuming He
The unary potentials in this graphical model are computed by making use of the images with known depth.
no code implementations • CVPR 2013 • Tao Wang, Xuming He, Nick Barnes
We propose a structured Hough voting method for detecting objects with heavy occlusion in indoor environments.
no code implementations • CVPR 2013 • Yansheng Ming, Hongdong Li, Xuming He
The main focus is given to how to maintain the consistency (compatibility) between the region cues and the boundary cues.
no code implementations • NeurIPS 2010 • Shuang Wu, Xuming He, Hongjing Lu, Alan L. Yuille
The human vision system is able to effortlessly perceive both short-range and long-range motion patterns in complex dynamic scenes.
no code implementations • NeurIPS 2008 • Xuming He, Richard S. Zemel
Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain.