Search Results for author: Sibei Yang

Found 26 papers, 9 papers with code

Propagating Over Phrase Relations for One-Stage Visual Grounding

no code implementations • ECCV 2020 • Sibei Yang, Guanbin Li, Yizhou Yu

Phrase level visual grounding aims to locate in an image the corresponding visual regions referred to by multiple noun phrases in a given sentence.

Phrase Grounding Relational Reasoning +2

Paper
Add Code

The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

1 code implementation • 18 Apr 2024 • Cheng Shi, Sibei Yang

Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks.

Instance Segmentation Object +3

Paper
Code

Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation

no code implementations • 18 Apr 2024 • Qiyuan Dai, Sibei Yang

Referring image segmentation (RIS) aims to precisely segment referents in images through corresponding natural language expressions, yet relying on cost-intensive mask annotations.

Image Segmentation Segmentation +1

Paper
Add Code

RealDex: Towards Human-like Grasping for Robotic Dexterous Hand

no code implementations • 21 Feb 2024 • Yumeng Liu, Yaxun Yang, Youzhuo Wang, Xiaofei Wu, Jiamin Wang, Yichen Yao, Sören Schwertfeger, Sibei Yang, Wenping Wang, Jingyi Yu, Xuming He, Yuexin Ma

In this paper, we introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns, enriched by multi-view and multimodal visual data.

Paper
Add Code

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

no code implementations • 14 Dec 2023 • Han Liang, Jiacheng Bao, Ruichi Zhang, Sihan Ren, Yuecheng Xu, Sibei Yang, Xin Chen, Jingyi Yu, Lan Xu

At the subsequent fine-tuning stage, we introduce motion ControlNet, which incorporates text prompts as conditioning information, through a trainable copy of the pre-trained model and the proposed novel Mixture-of-Controllers (MoC) block.

Paper
Add Code

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

1 code implementation • 30 Oct 2023 • Meng Lou, Hong-Yu Zhou, Sibei Yang, Yizhou Yu

Furthermore, when stacking token mixers that consist of convolution and self-attention to form a deep network, the static nature of convolution hinders the fusion of features previously generated by self-attention into convolution kernels.

Image Classification Object Detection +1

Paper
Code

Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator

2 code implementations • NeurIPS 2023 • Hanzhuo Huang, Yufan Feng, Cheng Shi, Lan Xu, Jingyi Yu, Sibei Yang

Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt.

Text-to-Video Generation Video Generation +1

Paper
Code

Temporal Collection and Distribution for Referring Video Object Segmentation

no code implementations • ICCV 2023 • Jiajin Tang, Ge Zheng, Sibei Yang

Furthermore, to explicitly capture object motions and spatial-temporal cross-modal reasoning over objects, we propose a novel temporal collection-distribution mechanism for interacting between the global referent token and object queries.

Object Referring Video Object Segmentation +2

Paper
Add Code

EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment

no code implementations • ICCV 2023 • Cheng Shi, Sibei Yang

Vision-language models such as CLIP have boosted the performance of open-vocabulary object detection, where the detector is trained on base categories but required to detect novel categories.

Object object-detection +2

Paper
Add Code

CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection

no code implementations • ICCV 2023 • Jiajin Tang, Ge Zheng, Jingyi Yu, Sibei Yang

Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection.

Object object-detection +2

Paper
Add Code

LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models

no code implementations • ICCV 2023 • Cheng Shi, Sibei Yang

Prompt engineering is a powerful tool used to enhance the performance of pre-trained models on downstream tasks.

Domain Generalization Few-Shot Learning +2

Paper
Add Code

Contrastive Grouping with Transformer for Referring Image Segmentation

1 code implementation • CVPR 2023 • Jiajin Tang, Ge Zheng, Cheng Shi, Sibei Yang

Referring image segmentation aims to segment the target referent in an image conditioning on a natural language expression.

Contrastive Learning Image Segmentation +3

Paper
Code

Grounded Image Text Matching with Mismatched Relation Reasoning

no code implementations • ICCV 2023 • Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He

This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models.

Image-text matching Relation +2

Paper
Add Code

WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language

no code implementations • 12 Apr 2023 • Zhenxiang Lin, Xidong Peng, Peishan Cong, Yuenan Hou, Xinge Zhu, Sibei Yang, Yuexin Ma

We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data, including 2D images and 3D LiDAR point clouds.

Autonomous Driving Object Localization +1

Paper
Add Code

PCRLv2: A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis

1 code implementation • 2 Jan 2023 • Hong-Yu Zhou, Chixiang Lu, Chaoqi Chen, Sibei Yang, Yizhou Yu

Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views.

Brain Tumor Segmentation Organ Segmentation +3

105

Paper
Code

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

no code implementations • 27 Sep 2022 • Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, Sibei Yang, Xiaoguang Han, Yizhou Yu

Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e. g.,} social network analysis and recommender systems), computer vision (\emph{e. g.,} object detection and point cloud learning), and natural language processing (\emph{e. g.,} relation extraction and sequence learning), to name a few.

Graph Representation Learning object-detection +3

Paper
Add Code

Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts

2 code implementations • ICCV 2021 • Hong-Yu Zhou, Chixiang Lu, Sibei Yang, Xiaoguang Han, Yizhou Yu

From this perspective, we introduce Preservational Learning to reconstruct diverse image contexts in order to preserve more information in learned representations.

Contrastive Learning Representation Learning +1

105

Paper
Code

ConvNets vs. Transformers: Whose Visual Representations are More Transferable?

no code implementations • 11 Aug 2021 • Hong-Yu Zhou, Chixiang Lu, Sibei Yang, Yizhou Yu

Vision transformers have attracted much attention from computer vision researchers as they are not restricted to the spatial inductive bias of ConvNets.

Classification Depth Estimation +5

Paper
Add Code

Bottom-Up Shift and Reasoning for Referring Image Segmentation

1 code implementation • CVPR 2021 • Sibei Yang, Meng Xia, Guanbin Li, Hong-Yu Zhou, Yizhou Yu

In this paper, we tackle the challenge by jointly performing compositional visual reasoning and accurate segmentation in a single stage via the proposed novel Bottom-Up Shift (BUS) and Bidirectional Attentive Refinement (BIAR) modules.

Image Segmentation Segmentation +2

Paper
Code

Graph-Structured Referring Expression Reasoning in The Wild

1 code implementation • CVPR 2020 • Sibei Yang, Guanbin Li, Yizhou Yu

The linguistic structure of a referring expression provides a layout of reasoning over the visual contents, and it is often crucial to align and jointly understand the image and the referring expression.

Referring Expression

116

Paper
Code

Dynamic Graph Attention for Referring Expression Comprehension

no code implementations • ICCV 2019 • Sibei Yang, Guanbin Li, Yizhou Yu

In this paper, we explore the problem of referring expression comprehension from the perspective of language-driven visual reasoning, and propose a dynamic graph attention network to perform multi-step reasoning by modeling both the relationships among the objects in the image and the linguistic structure of the expression.

Graph Attention Referring Expression +2

Paper
Add Code

Relationship-Embedded Representation Learning for Grounding Referring Expressions

1 code implementation • CVPR 2019 • Sibei Yang, Guanbin Li, Yizhou Yu

Unfortunately, existing work on grounding referring expressions fails to accurately extract multi-order relationships from the referring expression and associate them with the objects and their related contexts in the image.

Referring Expression Representation Learning

116

Paper
Code

Cross-Modal Relationship Inference for Grounding Referring Expressions

no code implementations • CVPR 2019 • Sibei Yang, Guanbin Li, Yizhou Yu

A feasible solution for grounding referring expressions not only needs to extract all the necessary information (i. e. objects and the relationships among them) in both the image and referring expressions, but also compute and represent multimodal contexts from the extracted information.

Paper
Add Code

Non-Local Context Encoder: Robust Biomedical Image Segmentation against Adversarial Attacks

no code implementations • 27 Apr 2019 • Xiang He, Sibei Yang, Guanbin Li?, Haofeng Li, Huiyou Chang, Yizhou Yu

In this paper, we discover that global spatial dependencies and global contextual information in a biomedical image can be exploited to defend against adversarial attacks.

Image Segmentation Lesion Segmentation +3

Paper
Add Code

Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning

no code implementations • CVPR 2018 • Weifeng Ge, Sibei Yang, Yizhou Yu

In this paper, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection and semantic segmentation.

Ranked #15 on Weakly Supervised Object Detection on PASCAL VOC 2007

Clustering General Classification +12

Paper
Add Code

Introduction to Clustering Algorithms and Applications

no code implementations • 20 Aug 2014 • Sibei Yang, Liangde Tao, Bingchen Gong

Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure.

Clustering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.