Search Results for author: Song Bai

Found 84 papers, 47 papers with code

Hypergraph Convolution and Hypergraph Attention

1 code implementation23 Jan 2019 Song Bai, Feihu Zhang, Philip H. S. Torr

To efficiently learn deep embeddings on the high-order graph-structured data, we introduce two end-to-end trainable operators to the family of graph neural networks, i. e., hypergraph convolution and hypergraph attention.

Node Classification Representation Learning

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

3 code implementations CVPR 2022 Peize Sun, Jinkun Cao, Yi Jiang, Zehuan Yuan, Song Bai, Kris Kitani, Ping Luo

A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association.

Multi-Object Tracking Object +3

Asymmetric Non-local Neural Networks for Semantic Segmentation

5 code implementations ICCV 2019 Zhen Zhu, Mengde Xu, Song Bai, Tengteng Huang, Xiang Bai

The non-local module works as a particularly useful technique for semantic segmentation while criticized for its prohibitive computation and GPU memory occupation.

Segmentation Semantic Segmentation

CenterNet: Keypoint Triplets for Object Detection

20 code implementations ICCV 2019 Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additional look into the cropped regions.

Object object-detection +1

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

3 code implementations26 Jun 2023 Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

In this work, we extend this editing framework to diffusion models and propose a novel approach DragDiffusion.

SeqFormer: Sequential Transformer for Video Instance Segmentation

2 code implementations15 Dec 2021 Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms shall be done with each frame independently.

Instance Segmentation Semantic Segmentation +1

In Defense of Online Models for Video Instance Segmentation

1 code implementation21 Jul 2022 Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Ranked #9 on Video Instance Segmentation on YouTube-VIS validation (using extra training data)

Contrastive Learning Instance Segmentation +5

InstMove: Instance Motion for Object-centric Video Segmentation

1 code implementation CVPR 2023 Qihao Liu, Junfeng Wu, Yi Jiang, Xiang Bai, Alan Yuille, Song Bai

A common solution is to use optical flow to provide motion information, but essentially it only considers pixel-level motion, which still relies on appearance similarity and hence is often inaccurate under occlusion and fast movement.

Object Optical Flow Estimation +3

TransMix: Attend to Mix for Vision Transformers

2 code implementations CVPR 2022 Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, Song Bai

The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map.

Instance Segmentation object-detection +3

MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

1 code implementation ICCV 2023 Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Philip H. S. Torr, Song Bai

However, since the target objects in these existing datasets are usually relatively salient, dominant, and isolated, VOS under complex scenes has rarely been studied.

Object Segmentation +3

Holistically-Attracted Wireframe Parsing

1 code implementation CVPR 2020 Nan Xue, Tianfu Wu, Song Bai, Fu-Dong Wang, Gui-Song Xia, Liangpei Zhang, Philip H. S. Torr

For computing line segment proposals, a novel exact dual representation is proposed which exploits a parsimonious geometric reparameterization for line segments and forms a holistic 4-dimensional attraction field map for an input image.

Line Segment Detection Wireframe Parsing

Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning

1 code implementation24 Oct 2022 Nan Xue, Tianfu Wu, Song Bai, Fu-Dong Wang, Gui-Song Xia, Liangpei Zhang, Philip H. S. Torr

This article presents Holistically-Attracted Wireframe Parsing (HAWP), a method for geometric analysis of 2D images containing wireframes formed by line segments and junctions.

Self-Supervised Learning Wireframe Parsing

PCL: Proposal Cluster Learning for Weakly Supervised Object Detection

4 code implementations9 Jul 2018 Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille

The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.

Multiple Instance Learning Object +3

XingGAN for Person Image Generation

2 code implementations ECCV 2020 Hao Tang, Song Bai, Li Zhang, Philip H. S. Torr, Nicu Sebe

We propose a novel Generative Adversarial Network (XingGAN or CrossingGAN) for person image generation tasks, i. e., translating the pose of a given person to a desired one.

 Ranked #1 on Pose Transfer on Market-1501 (IS metric)

Generative Adversarial Network Pose Transfer

CenterNet++ for Object Detection

2 code implementations18 Apr 2022 Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian

Our approach, named CenterNet, detects each object as a triplet keypoints (top-left and bottom-right corners and the center keypoint).

Object object-detection +1

SRFormer: Permuted Self-Attention for Single Image Super-Resolution

1 code implementation ICCV 2023 Yupeng Zhou, Zhen Li, Chun-Le Guo, Song Bai, Ming-Ming Cheng, Qibin Hou

Previous works have shown that increasing the window size for Transformer-based image super-resolution models (e. g., SwinIR) can significantly improve the model performance but the computation overhead is also considerable.

Image Super-Resolution

Anchor-Free Person Search

1 code implementation CVPR 2021 Yichao Yan, Jinpeng Li, Jie Qin, Song Bai, Shengcai Liao, Li Liu, Fan Zhu, Ling Shao

Person search aims to simultaneously localize and identify a query person from realistic, uncropped images, which can be regarded as the unified task of pedestrian detection and person re-identification (re-id).

Pedestrian Detection Person Re-Identification +1

Is synthetic data from generative models ready for image recognition?

1 code implementation14 Oct 2022 Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi

Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.

Text-to-Image Generation Transfer Learning

Improving Transferability of Adversarial Examples with Input Diversity

2 code implementations CVPR 2019 Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jian-Yu Wang, Zhou Ren, Alan Yuille

We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future.

Adversarial Attack Image Classification

Location-Sensitive Visual Recognition with Cross-IOU Loss

1 code implementation11 Apr 2021 Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi Tian

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks.

2D Human Pose Estimation Instance Segmentation +5

End-to-end Temporal Action Detection with Transformer

1 code implementation18 Jun 2021 Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Shiwei Zhang, Song Bai, Xiang Bai

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video.

Action Detection Temporal Action Localization +1

Bipartite Graph Reasoning GANs for Person Image Generation

1 code implementation10 Aug 2020 Hao Tang, Song Bai, Philip H. S. Torr, Nicu Sebe

We present a novel Bipartite Graph Reasoning GAN (BiGraphGAN) for the challenging person image generation task.

 Ranked #1 on Pose Transfer on Market-1501 (PCKh metric)

Pose Transfer

Visual Parser: Representing Part-whole Hierarchies with Transformers

2 code implementations13 Jul 2021 Shuyang Sun, Xiaoyu Yue, Song Bai, Philip Torr

To model the representations of the two levels, we first encode the information from the whole into part vectors through an attention mechanism, then decode the global information within the part vectors back into the whole representation.

Image Classification Instance Segmentation +3

Dual Attention GANs for Semantic Image Synthesis

1 code implementation29 Aug 2020 Hao Tang, Song Bai, Nicu Sebe

We also propose two novel modules, i. e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively.

Image Generation Position

Neural Architecture Search for Lightweight Non-Local Networks

2 code implementations CVPR 2020 Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille

However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.

Image Classification Neural Architecture Search

SwiftNet: Real-time Video Object Segmentation

1 code implementation CVPR 2021 Haochen Wang, XiaoLong Jiang, Haibing Ren, Yao Hu, Song Bai

In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77. 8% J &F and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance.

Object Segmentation +3

An Empirical Study of End-to-End Temporal Action Detection

1 code implementation CVPR 2022 Xiaolong Liu, Song Bai, Xiang Bai

Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD.

Action Classification Action Detection +2

Occluded Video Instance Segmentation: A Benchmark

2 code implementations2 Feb 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Instance Segmentation Segmentation +3

Learning Transferable Adversarial Examples via Ghost Networks

1 code implementation9 Dec 2018 Yingwei Li, Song Bai, Yuyin Zhou, Cihang Xie, Zhishuai Zhang, Alan Yuille

The critical principle of ghost networks is to apply feature-level perturbations to an existing model to potentially create a huge set of diverse models.

Adversarial Attack

Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

2 code implementations1 Oct 2022 Yujun Shi, Jian Liang, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

To remedy this problem caused by the data heterogeneity, we propose {\sc FedDecorr}, a novel method that can effectively mitigate dimensional collapse in federated learning.

Federated Learning

Triplet-Center Loss for Multi-View 3D Object Retrieval

1 code implementation CVPR 2018 Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, Xiang Bai

Most existing 3D object recognition algorithms focus on leveraging the strong discriminative power of deep learning models with softmax loss for the classification of 3D data, while learning discriminative features with deep metric learning for 3D object retrieval is more or less neglected.

3D Object Recognition 3D Object Retrieval +6

Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

1 code implementation ECCV 2020 Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille

We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations.

object-detection Object Detection +1

Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning

1 code implementation CVPR 2022 Yujun Shi, Kuangqi Zhou, Jian Liang, Zihang Jiang, Jiashi Feng, Philip Torr, Song Bai, Vincent Y. F. Tan

Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance.

Class Incremental Learning Incremental Learning

Adversarial Metric Attack and Defense for Person Re-identification

1 code implementation30 Jan 2019 Song Bai, Yingwei Li, Yuyin Zhou, Qizhu Li, Philip H. S. Torr

However, our work observes the extreme vulnerability of existing distance metrics to adversarial examples, generated by simply adding human-imperceptible perturbations to person images.

Adversarial Attack Benchmarking +2

AutoScale: Learning to Scale for Crowd Counting and Localization

2 code implementations20 Dec 2019 Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Xiang Bai, Masayoshi Tomizuka

A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels.

Crowd Counting Model Optimization

Learn to Interpret Atari Agents

1 code implementation29 Dec 2018 Zhao Yang, Song Bai, Li Zhang, Philip H. S. Torr

Deep reinforcement learning (DeepRL) agents surpass human-level performance in many tasks.

Decision Making

Hard-Aware Point-to-Set Deep Metric for Person Re-identification

1 code implementation ECCV 2018 Rui Yu, Zhiyong Dou, Song Bai, Zhao-Xiang Zhang, Yongchao Xu, Xiang Bai

Person re-identification (re-ID) is a highly challenging task due to large variations of pose, viewpoint, illumination, and occlusion.

Metric Learning Person Re-Identification

Fourier Document Restoration for Robust Document Dewarping and Recognition

1 code implementation CVPR 2022 Chuhui Xue, Zichen Tian, Fangneng Zhan, Shijian Lu, Song Bai

State-of-the-art document dewarping techniques learn to predict 3-dimensional information of documents which are prone to errors while dealing with documents with irregular distortions or large variations in depth.

Semi-Supervised Multi-Organ Segmentation via Deep Multi-Planar Co-Training

no code implementations7 Apr 2018 Yuyin Zhou, Yan Wang, Peng Tang, Song Bai, Wei Shen, Elliot K. Fishman, Alan L. Yuille

In multi-organ segmentation of abdominal CT scans, most existing fully supervised deep learning algorithms require lots of voxel-wise annotations, which are usually difficult, expensive, and slow to obtain.

Image Segmentation Organ Segmentation +2

Multidimensional Scaling on Multiple Input Distance Matrices

no code implementations1 May 2016 Song Bai, Xiang Bai, Longin Jan Latecki, Qi Tian

How to do multidimensional scaling on multiple input distance matrices is still unsolved to our best knowledge.

Scalable Person Re-identification on Supervised Smoothed Manifold

no code implementations CVPR 2017 Song Bai, Xiang Bai, Qi Tian

Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for person images.

Person Re-Identification

Deep Learning Representation using Autoencoder for 3D Shape Retrieval

no code implementations25 Sep 2014 Zhuotun Zhu, Xinggang Wang, Song Bai, Cong Yao, Xiang Bai

By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.

3D Shape Classification 3D Shape Recognition +5

Ensemble Diffusion for Retrieval

no code implementations ICCV 2017 Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, Qi Tian

This stimulates a great research interest of considering similarity fusion in the framework of diffusion process (i. e., fusion with diffusion) for robust retrieval.

3D Shape Classification 3D Shape Retrieval +2

Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification

no code implementations CVPR 2019 Song Bai, Peng Tang, Philip H.S. Torr, Longin Jan Latecki

This work studies the unsupervised re-ranking procedure for object retrieval and person re-identification with a specific concentration on an ensemble of multiple metrics (or similarities).

3D Shape Classification 3D Shape Retrieval +4

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

no code implementations ICCV 2019 Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai

Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.

Crowd Counting Density Estimation

Symmetry-constrained Rectification Network for Scene Text Recognition

no code implementations ICCV 2019 MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.

Scene Text Recognition

View N-gram Network for 3D Object Retrieval

no code implementations ICCV 2019 Xinwei He, Tengteng Huang, Song Bai, Xiang Bai

By doing so, spatial information across multiple views is captured, which helps to learn a discriminative global embedding for each 3D object.

3D Object Retrieval 3D Shape Classification +3

Learning Regional Attraction for Line Segment Detection

no code implementations18 Dec 2019 Nan Xue, Song Bai, Fu-Dong Wang, Gui-Song Xia, Tianfu Wu, Liangpei Zhang, Philip H. S. Torr

Given a line segment map, the proposed regional attraction first establishes the relationship between line segments and regions in the image lattice.

Line Segment Detection

FedOCR: Communication-Efficient Federated Learning for Scene Text Recognition

no code implementations22 Jul 2020 Wenqing Zhang, Yang Qiu, Song Bai, Rui Zhang, Xiaolin Wei, Xiang Bai

In this paper, we study how to make use of decentralized datasets for training a robust scene text recognizer while keeping them stay on local devices.

Federated Learning Privacy Preserving +1

Importance-Aware Semantic Segmentation in Self-Driving with Discrete Wasserstein Training

no code implementations21 Oct 2020 Xiaofeng Liu, Yuzhuo Han, Song Bai, Yi Ge, Tianxing Wang, Xu Han, Site Li, Jane You, Ju Lu

However, the cross entropy loss can not take the different importance of each class in an self-driving system into account.

Segmentation Self-Driving Cars +1

I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition

no code implementations18 May 2021 Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Changhu Wang, Song Bai

The first task focuses on image-to-character (I2C) mapping which detects a set of character candidates from images based on different alignments of visual features in an non-sequential way.

Scene Text Recognition

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

no code implementations ICCV 2021 Bin Tan, Nan Xue, Song Bai, Tianfu Wu, Gui-Song Xia

This paper presents a neural network built upon Transformers, namely PlaneTR, to simultaneously detect and reconstruct planes from a single image.

LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition

no code implementations3 Sep 2021 Xinwei He, Silin Cheng, Dingkang Liang, Song Bai, Xi Wang, Yingying Zhu

To investigate this, we propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification.

3D Object Classification 3D Object Retrieval +3

Contextual Text Detection

no code implementations29 Sep 2021 Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Song Bai, Changhu Wang

This paper presents Contextual Text Detection, a new setup that detects contextual text blocks for better understanding of texts in scenes.

Text Detection

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

no code implementations15 Nov 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.

Instance Segmentation Object Recognition +3

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting

no code implementations8 Mar 2022 Chuhui Xue, Wenqing Zhang, Yu Hao, Shijian Lu, Philip Torr, Song Bai

Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features, respectively, as well as a visual-textual decoder that models the interaction among textual and visual features for learning effective scene text representations.

Optical Character Recognition Optical Character Recognition (OCR) +2

Contextual Text Block Detection towards Scene Text Understanding

no code implementations26 Jul 2022 Chuhui Xue, Jiaxing Huang, Shijian Lu, Changhu Wang, Song Bai

We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.

text-classification Text Classification +2

Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

no code implementations29 Jul 2022 Qihao Liu, Yi Zhang, Song Bai, Alan Yuille

Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions.

3D Human Pose Estimation 3D Multi-Person Pose Estimation (absolute) +2

Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition

no code implementations4 Aug 2022 Zhangzi Zhu, Yu Hao, Wenqing Zhang, Chuhui Xue, Song Bai

This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition.

The Runner-up Solution for YouTube-VIS Long Video Challenge 2022

no code implementations18 Nov 2022 Junfeng Wu, Yi Jiang, Qihao Liu, Xiang Bai, Song Bai

This technical report describes our 2nd-place solution for the ECCV 2022 YouTube-VIS Long Video Challenge.

Contrastive Learning Instance Segmentation +2

LUMix: Improving Mixup by Better Modelling Label Uncertainty

no code implementations29 Nov 2022 Shuyang Sun, Jie-Neng Chen, Ruifei He, Alan Yuille, Philip Torr, Song Bai

LUMix is simple as it can be implemented in just a few lines of code and can be universally applied to any deep networks \eg CNNs and Vision Transformers, with minimal computational cost.

Data Augmentation

PV3D: A 3D Generative Model for Portrait Video Generation

no code implementations13 Dec 2022 Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, Mike Zheng Shou

While some prior works have applied such image GANs to unconditional 2D portrait video generation and static 3D portrait synthesis, there are few works successfully extending GANs for generating 3D-aware portrait videos.

Video Generation

Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search

no code implementations1 Jun 2023 Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille

(2) We find regions in the latent space that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured.

Adversarial Attack Efficient Exploration +1

Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

no code implementations1 Aug 2023 Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi

To address this challenge, we propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for multi-view images of 3D scenes.

3D Open-Vocabulary Instance Segmentation Instance Segmentation +4

Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks

no code implementations13 Aug 2023 David Junhao Zhang, Mutian Xu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou

Despite the rapid advancement of unsupervised learning in visual representation, it requires training on large-scale datasets that demand costly data collection, and pose additional challenges due to concerns regarding data privacy.

Contrastive Learning Image Classification +2

Dataset Condensation via Generative Model

no code implementations14 Sep 2023 David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang, Song Bai, Mike Zheng Shou

Dataset condensation aims to condense a large dataset with a lot of training samples into a small set.

Dataset Condensation

Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery

no code implementations5 Dec 2023 Yansheng Li, Junwei Luo, Yongjun Zhang, Yihua Tan, Jin-Gang Yu, Song Bai

Therefore, to ensure the visibility and integrity of bridges, it is essential to perform holistic bridge detection in large-size very-high-resolution (VHR) RSIs.

object-detection Object Detection

Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human

no code implementations5 Jan 2024 Song Bai, Jie Li

Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation.

3D Generation

Debiasing Text-to-Image Diffusion Models

no code implementations22 Feb 2024 Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi

Despite its simplicity, we show that IDA shows efficiency and fast convergence in resolving the social bias in TTI diffusion models.

Cannot find the paper you are looking for? You can Submit a new open access paper.