Search Results for author: Sibei Yang

Found 29 papers, 12 papers with code

Propagating Over Phrase Relations for One-Stage Visual Grounding

no code implementations ECCV 2020 Sibei Yang, Guanbin Li, Yizhou Yu

Phrase level visual grounding aims to locate in an image the corresponding visual regions referred to by multiple noun phrases in a given sentence.

Phrase Grounding Relational Reasoning +2

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

no code implementations2 Dec 2024 Chunlin Yu, Hanqing Wang, Ye Shi, Haoyang Luo, Sibei Yang, Jingyi Yu, Jingya Wang

In this paper, we introduce the Sequential 3D Affordance Reasoning task, which extends the traditional paradigm by reasoning from cumbersome user intentions and then decomposing them into a series of segmentation maps.

Language Modeling Language Modelling +4

Plain-Det: A Plain Multi-Dataset Object Detector

1 code implementation14 Jul 2024 Cheng Shi, Yuchen Zhu, Sibei Yang

Recent advancements in large-scale foundational models have sparked widespread interest in training highly proficient large vision models.

Object object-detection +1

Part2Object: Hierarchical Unsupervised 3D Instance Segmentation

1 code implementation14 Jul 2024 Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma, Sibei Yang

By training Hi-Mask3D on the objects and object parts extracted from Part2Object, we achieve consistent and superior performance compared to state-of-the-art models in various settings, including unsupervised instance segmentation, data-efficient fine-tuning, and cross-dataset generalization.

3D Instance Segmentation Clustering +4

The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

1 code implementation18 Apr 2024 Cheng Shi, Sibei Yang

Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks.

Instance Segmentation Object +3

Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation

no code implementations CVPR 2024 Qiyuan Dai, Sibei Yang

Referring image segmentation (RIS) aims to precisely segment referents in images through corresponding natural language expressions, yet relying on cost-intensive mask annotations.

Image Segmentation Segmentation +1

RealDex: Towards Human-like Grasping for Robotic Dexterous Hand

no code implementations21 Feb 2024 Yumeng Liu, Yaxun Yang, Youzhuo Wang, Xiaofei Wu, Jiamin Wang, Yichen Yao, Sören Schwertfeger, Sibei Yang, Wenping Wang, Jingyi Yu, Xuming He, Yuexin Ma

In this paper, we introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns, enriched by multi-view and multimodal visual data.

Motion Generation

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

no code implementations CVPR 2024 Han Liang, Jiacheng Bao, Ruichi Zhang, Sihan Ren, Yuecheng Xu, Sibei Yang, Xin Chen, Jingyi Yu, Lan Xu

At the subsequent fine-tuning stage, we introduce motion ControlNet, which incorporates text prompts as conditioning information, through a trainable copy of the pre-trained model and the proposed novel Mixture-of-Controllers (MoC) block.

Motion Generation

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

1 code implementation30 Oct 2023 Meng Lou, Hong-Yu Zhou, Sibei Yang, Yizhou Yu

Furthermore, when stacking token mixers that consist of convolution and self-attention to form a deep network, the static nature of convolution hinders the fusion of features previously generated by self-attention into convolution kernels.

Image Classification Object Detection +1

Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator

2 code implementations NeurIPS 2023 Hanzhuo Huang, Yufan Feng, Cheng Shi, Lan Xu, Jingyi Yu, Sibei Yang

Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt.

Text-to-Video Generation Video Generation +1

Temporal Collection and Distribution for Referring Video Object Segmentation

no code implementations ICCV 2023 Jiajin Tang, Ge Zheng, Sibei Yang

Furthermore, to explicitly capture object motions and spatial-temporal cross-modal reasoning over objects, we propose a novel temporal collection-distribution mechanism for interacting between the global referent token and object queries.

Object Referring Video Object Segmentation +2

CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection

no code implementations ICCV 2023 Jiajin Tang, Ge Zheng, Jingyi Yu, Sibei Yang

Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection.

Object object-detection +2

EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment

no code implementations ICCV 2023 Cheng Shi, Sibei Yang

Vision-language models such as CLIP have boosted the performance of open-vocabulary object detection, where the detector is trained on base categories but required to detect novel categories.

Object object-detection +2

Contrastive Grouping with Transformer for Referring Image Segmentation

1 code implementation CVPR 2023 Jiajin Tang, Ge Zheng, Cheng Shi, Sibei Yang

Referring image segmentation aims to segment the target referent in an image conditioning on a natural language expression.

Contrastive Learning Image Segmentation +3

Grounded Image Text Matching with Mismatched Relation Reasoning

no code implementations ICCV 2023 Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He

This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models.

Image-text matching Relation +2

WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language

1 code implementation12 Apr 2023 Zhenxiang Lin, Xidong Peng, Peishan Cong, Ge Zheng, Yujin Sun, Yuenan Hou, Xinge Zhu, Sibei Yang, Yuexin Ma

We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data, including 2D images and 3D LiDAR point clouds.

3D visual grounding Autonomous Driving +1

PCRLv2: A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis

1 code implementation2 Jan 2023 Hong-Yu Zhou, Chixiang Lu, Chaoqi Chen, Sibei Yang, Yizhou Yu

Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views.

Brain Tumor Segmentation Medical Image Analysis +4

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

no code implementations27 Sep 2022 Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, Sibei Yang, Xiaoguang Han, Yizhou Yu

Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e. g.,} social network analysis and recommender systems), computer vision (\emph{e. g.,} object detection and point cloud learning), and natural language processing (\emph{e. g.,} relation extraction and sequence learning), to name a few.

Graph Representation Learning object-detection +3

Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts

2 code implementations ICCV 2021 Hong-Yu Zhou, Chixiang Lu, Sibei Yang, Xiaoguang Han, Yizhou Yu

From this perspective, we introduce Preservational Learning to reconstruct diverse image contexts in order to preserve more information in learned representations.

Contrastive Learning Representation Learning +1

ConvNets vs. Transformers: Whose Visual Representations are More Transferable?

no code implementations11 Aug 2021 Hong-Yu Zhou, Chixiang Lu, Sibei Yang, Yizhou Yu

Vision transformers have attracted much attention from computer vision researchers as they are not restricted to the spatial inductive bias of ConvNets.

Classification Depth Estimation +5

Bottom-Up Shift and Reasoning for Referring Image Segmentation

1 code implementation CVPR 2021 Sibei Yang, Meng Xia, Guanbin Li, Hong-Yu Zhou, Yizhou Yu

In this paper, we tackle the challenge by jointly performing compositional visual reasoning and accurate segmentation in a single stage via the proposed novel Bottom-Up Shift (BUS) and Bidirectional Attentive Refinement (BIAR) modules.

Image Segmentation Segmentation +2

Graph-Structured Referring Expression Reasoning in The Wild

1 code implementation CVPR 2020 Sibei Yang, Guanbin Li, Yizhou Yu

The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression.

Referring Expression

Dynamic Graph Attention for Referring Expression Comprehension

no code implementations ICCV 2019 Sibei Yang, Guanbin Li, Yizhou Yu

In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step reasoning by modeling both the relationships among the objects in the image and the linguistic structure of the expression.

Graph Attention Referring Expression +2

Relationship-Embedded Representation Learning for Grounding Referring Expressions

1 code implementation CVPR 2019 Sibei Yang, Guanbin Li, Yizhou Yu

Unfortunately, existing work on grounding referring expressions fails to accurately extract multi-order relationships from the referring expression and associate them with the objects and their related contexts in the image.

Referring Expression Representation Learning

Cross-Modal Relationship Inference for Grounding Referring Expressions

no code implementations CVPR 2019 Sibei Yang, Guanbin Li, Yizhou Yu

A feasible solution for grounding referring expressions not only needs to extract all the necessary information (i. e. objects and the relationships among them) in both the image and referring expressions, but also compute and represent multimodal contexts from the extracted information.

Non-Local Context Encoder: Robust Biomedical Image Segmentation against Adversarial Attacks

no code implementations27 Apr 2019 Xiang He, Sibei Yang, Guanbin Li?, Haofeng Li, Huiyou Chang, Yizhou Yu

In this paper, we discover that global spatial dependencies and global contextual information in a biomedical image can be exploited to defend against adversarial attacks.

Image Segmentation Lesion Segmentation +3

Introduction to Clustering Algorithms and Applications

no code implementations20 Aug 2014 Sibei Yang, Liangde Tao, Bingchen Gong

Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure.


Cannot find the paper you are looking for? You can Submit a new open access paper.