Search Results for author: Qibin Hou

Found 65 papers, 44 papers with code

VOLO: Vision Outlooker for Visual Recognition

7 code implementations • 24 Jun 2021 • Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, Shuicheng Yan

Though recently the prevailing vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification, their performance is still inferior to that of the latest SOTA CNNs if no extra data are provided.

Ranked #1 on Image Classification on VizWiz-Classification

Domain Generalization Image Classification +1

29,758

Paper
Code

Localization Distillation for Dense Object Detection

2 code implementations • CVPR 2022 • Zhaohui Zheng, Rongguang Ye, Ping Wang, Dongwei Ren, WangMeng Zuo, Qibin Hou, Ming-Ming Cheng

Previous KD methods for object detection mostly focus on imitating deep features within the imitation regions instead of mimicking classification logit due to its inefficiency in distilling localization information and trivial improvement.

Dense Object Detection Knowledge Distillation +2

27,790

Paper
Code

LayerCAM: Exploring Hierarchical Class Activation Maps for Localization

3 code implementations • IEEE 2021 • Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming-Ming Cheng, Yunchao Wei

To evaluate the quality of the class activation maps produced by LayerCAM, we apply them to weakly-supervised object localization and semantic segmentation.

Object Semantic Segmentation +1

9,437

Paper
Code

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

3 code implementations • 18 Sep 2022 • Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, ZhengNing Liu, Ming-Ming Cheng, Shi-Min Hu

Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90. 6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it.

Ranked #1 on Semantic Segmentation on iSAID

Segmentation Semantic Segmentation

7,405

Paper
Code

Rethinking Bottleneck Structure for Efficient Mobile Network Design

4 code implementations • ECCV 2020 • Zhou Daquan, Qibin Hou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

In this paper, we rethink the necessity of such design changes and find it may bring risks of information loss and gradient confusion.

General Classification Neural Architecture Search +2

950

Paper
Code

Coordinate Attention for Efficient Mobile Network Design

2 code implementations • CVPR 2021 • Qibin Hou, Daquan Zhou, Jiashi Feng

Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e. g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps.

object-detection Object Detection +1

950

Paper
Code

A Simple Pooling-Based Design for Real-Time Salient Object Detection

5 code implementations • CVPR 2019 • Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, Jianmin Jiang

We further design a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down pathway.

Ranked #1 on RGB Salient Object Detection on SOD

object-detection RGB Salient Object Detection +1

618

Paper
Code

All Tokens Matter: Token Labeling for Training Better Vision Transformers

6 code implementations • NeurIPS 2021 • Zihang Jiang, Qibin Hou, Li Yuan, Daquan Zhou, Yujun Shi, Xiaojie Jin, Anran Wang, Jiashi Feng

In this paper, we present token labeling -- a new training objective for training high-performance vision transformers (ViTs).

Ranked #3 on Efficient ViTs on ImageNet-1K (With LV-ViT-S)

Efficient ViTs General Classification +1

417

Paper
Code

Rotate to Attend: Convolutional Triplet Attention Module

5 code implementations • 6 Oct 2020 • Diganta Misra, Trikay Nalamada, Ajay Uppili Arasanipalai, Qibin Hou

In this paper, we investigate light-weight but effective attention mechanisms and present triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure.

Image Classification Instance Segmentation +3

383

Paper
Code

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

2 code implementations • CVPR 2020 • Qibin Hou, Li Zhang, Ming-Ming Cheng, Jiashi Feng

Spatial pooling has been proven highly effective in capturing long-range contextual information for pixel-wise prediction tasks, such as scene parsing.

Ranked #32 on Semantic Segmentation on Cityscapes test

Scene Parsing Semantic Segmentation

381

Paper
Code

Localization Distillation for Object Detection

1 code implementation • 12 Apr 2022 • Zhaohui Zheng, Rongguang Ye, Qibin Hou, Dongwei Ren, Ping Wang, WangMeng Zuo, Ming-Ming Cheng

Combining these two new components, for the first time, we show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking underperforms for years.

Knowledge Distillation Object +2

343

Paper
Code

Large Selective Kernel Network for Remote Sensing Object Detection

1 code implementation • ICCV 2023 • YuXuan Li, Qibin Hou, Zhaohui Zheng, Ming-Ming Cheng, Jian Yang, Xiang Li

To the best of our knowledge, this is the first time that large and selective kernel mechanisms have been explored in the field of remote sensing object detection.

Ranked #1 on Semantic Segmentation on UAVid

Object object-detection +3

327

Paper
Code

LSKNet: A Foundation Lightweight Backbone for Remote Sensing

1 code implementation • 18 Mar 2024 • YuXuan Li, Xiang Li, Yimain Dai, Qibin Hou, Li Liu, Yongxiang Liu, Ming-Ming Cheng, Jian Yang

While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios.

object-detection Object Detection +1

327

Paper
Code

Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks

2 code implementations • 15 Jul 2019 • Deng-Ping Fan, Zheng Lin, Jia-Xing Zhao, Yun Liu, Zhao Zhang, Qibin Hou, Menglong Zhu, Ming-Ming Cheng

The use of RGB-D information for salient object detection has been extensively explored in recent years.

Ranked #4 on RGB-D Salient Object Detection on RGBD135

Object object-detection +3

320

Paper
Code

Deeply supervised salient object detection with short connections

4 code implementations • CVPR 2017 • Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, Philip Torr

Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs).

Ranked #4 on RGB Salient Object Detection on SBU

Boundary Detection Object +5

238

Paper
Code

YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection

1 code implementation • 10 Aug 2023 • Yuming Chen, Xinbin Yuan, Ruiqi Wu, Jiabao Wang, Qibin Hou, Ming-Ming Cheng

We aim at providing the object detection community with an efficient and performant object detector, termed YOLO-MS.

Object object-detection +2

207

Paper
Code

SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection

1 code implementation • 11 Mar 2024 • YuXuan Li, Xiang Li, Weijie Li, Qibin Hou, Li Liu, Ming-Ming Cheng, Jian Yang

To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.

Ranked #1 on 2D Object Detection on SARDet-100K (using extra training data)

2k Object +2

203

Paper
Code

AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation

1 code implementation • CVPR 2023 • Zhen Li, Zuo-Liang Zhu, Ling-Hao Han, Qibin Hou, Chun-Le Guo, Ming-Ming Cheng

It is based on two essential designs.

Video Frame Interpolation

198

Paper
Code

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition

4 code implementations • 23 Jun 2021 • Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng

By realizing the importance of the positional information carried by 2D feature representations, unlike recent MLP-like models that encode the spatial information along the flattened spatial dimensions, Vision Permutator separately encodes the feature representations along the height and width dimensions with linear projections.

184

Paper
Code

SRFormer: Permuted Self-Attention for Single Image Super-Resolution

1 code implementation • ICCV 2023 • Yupeng Zhou, Zhen Li, Chun-Le Guo, Song Bai, Ming-Ming Cheng, Qibin Hou

Previous works have shown that increasing the window size for Transformer-based image super-resolution models (e. g., SwinIR) can significantly improve the model performance but the computation overhead is also considerable.

Image Super-Resolution

169

Paper
Code

MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention

1 code implementation • 14 Dec 2023 • Hao Shao, Quansheng Zeng, Qibin Hou, Jufeng Yang

To process the significant variations of lesion regions or organs in individual sizes and shapes, we also use multiple convolutions of strip-shape kernels with different kernel sizes in each axial attention path to improve the efficiency of the proposed MCA in encoding spatial information.

Image Segmentation Lesion Segmentation +4

153

Paper
Code

Polyper: Boundary Sensitive Polyp Segmentation

1 code implementation • 14 Dec 2023 • Hao Shao, Yang Zhang, Qibin Hou

We present a new boundary sensitive framework for polyp segmentation, called Polyper.

Segmentation

153

Paper
Code

DeepViT: Towards Deeper Vision Transformer

5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng

In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.

Ranked #426 on Image Classification on ImageNet

Image Classification Representation Learning

134

Paper
Code

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

1 code implementation • 22 Nov 2022 • Qibin Hou, Cheng-Ze Lu, Ming-Ming Cheng, Jiashi Feng

This paper does not attempt to design a state-of-the-art method for visual recognition but investigates a more efficient way to make use of convolutions to encode spatial features.

object-detection Object Detection +1

128

Paper
Code

CrossKD: Cross-Head Knowledge Distillation for Object Detection

1 code implementation • 20 Jun 2023 • Jiabao Wang, Yuming Chen, Zhaohui Zheng, Xiang Li, Ming-Ming Cheng, Qibin Hou

Moreover, as mimicking the teacher's predictions is the target of KD, CrossKD offers more task-oriented information in contrast with feature imitation.

Dense Object Detection Knowledge Distillation +3

111

Paper
Code

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

1 code implementation • 18 Sep 2023 • Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou

We present DFormer, a novel RGB-D pretraining framework to learn transferable representations for RGB-D segmentation tasks.

Ranked #1 on RGB-D Salient Object Detection on DES

object-detection Representation Learning +5

109

Paper
Code

Refiner: Refining Self-attention for Vision Transformers

1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng

Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.

Ranked #174 on Image Classification on ImageNet

Image Classification

106

Paper
Code

Delving Deep into Label Smoothing

2 code implementations • 25 Nov 2020 • Chang-Bin Zhang, Peng-Tao Jiang, Qibin Hou, Yunchao Wei, Qi Han, Zhen Li, Ming-Ming Cheng

Experiments demonstrate that based on the same classification models, the proposed approach can effectively improve the classification performance on CIFAR-100, ImageNet, and fine-grained datasets.

Classification General Classification

Paper
Code

S4Net: Single Stage Salient-Instance Segmentation

1 code implementation • CVPR 2019 • Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu

Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch.

Instance Segmentation Segmentation +1

Paper
Code

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation

1 code implementation • 7 Jun 2023 • Boyuan Sun, YuQi Yang, Le Zhang, Ming-Ming Cheng, Qibin Hou

Motivated by these, we aim to improve the use efficiency of unlabeled data by designing two novel label propagation strategies.

Segmentation Semi-Supervised Semantic Segmentation

Paper
Code

CamoFormer: Masked Separable Attention for Camouflaged Object Detection

1 code implementation • 10 Dec 2022 • Bowen Yin, Xuying Zhang, Qibin Hou, Bo-Yuan Sun, Deng-Ping Fan, Luc van Gool

How to identify and segment camouflaged objects from the background is challenging.

Object object-detection +1

Paper
Code

Referring Camouflaged Object Detection

1 code implementation • 13 Jun 2023 • Xuying Zhang, Bowen Yin, Zheng Lin, Qibin Hou, Deng-Ping Fan, Ming-Ming Cheng

We consider the problem of referring camouflaged object detection (Ref-COD), a new task that aims to segment specified camouflaged objects based on a small set of referring images with salient target objects.

Object object-detection +1

Paper
Code

L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation

1 code implementation • CVPR 2022 • Peng-Tao Jiang, YuQi Yang, Qibin Hou, Yunchao Wei

Our framework conducts the global network to learn the captured rich object detail knowledge from a global view and thereby produces high-quality attention maps that can be directly used as pseudo annotations for semantic segmentation networks.

Ranked #16 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)

Object Transfer Learning +2

Paper
Code

Contrastive Masked Autoencoders are Stronger Vision Learners

1 code implementation • 27 Jul 2022 • Zhicheng Huang, Xiaojie Jin, Chengze Lu, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng

The momentum encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart.

Contrastive Learning Image Classification +3

Paper
Code

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

1 code implementation • 28 Mar 2023 • Senmao Li, Joost Van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yang

A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.

Ranked #7 on Text-based Image Editing on PIE-Bench

Text-based Image Editing

Paper
Code

Towards Spatial Equilibrium Object Detection

1 code implementation • 14 Jan 2023 • Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ming-Ming Cheng

In this paper, we study the spatial disequilibrium problem of modern object detectors and propose to quantify this ``spatial bias'' by measuring the detection performance over zones.

Object object-detection +1

Paper
Code

Zone Evaluation: Revealing Spatial Bias in Object Detection

1 code implementation • 20 Oct 2023 • Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ping Wang, Ming-Ming Cheng

A fundamental limitation of object detectors is that they suffer from "spatial bias", and in particular perform less satisfactorily when detecting objects near image borders.

Object object-detection +1

Paper
Code

Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

1 code implementation • 8 Feb 2024 • Senmao Li, Joost Van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yang

However, these models struggle to effectively suppress the generation of undesired content, which is explicitly requested to be omitted from the generated image in the prompt.

Paper
Code

LV-BERT: Exploiting Layer Variety for BERT

1 code implementation • Findings (ACL) 2021 • Weihao Yu, Zihang Jiang, Fei Chen, Qibin Hou, Jiashi Feng

In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order.

Paper
Code

A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation

1 code implementation • 10 Dec 2023 • Yunheng Li, Zhongyu Li, ShangHua Gao, Qilong Wang, Qibin Hou, Ming-Ming Cheng

Effectively modeling discriminative spatio-temporal information is essential for segmenting activities in long action sequences.

Action Segmentation

Paper
Code

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

1 code implementation • 26 Mar 2024 • YuQi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Li

Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network.

Paper
Code

Traffic Scene Parsing through the TSP6K Dataset

1 code implementation • 6 Mar 2023 • Peng-Tao Jiang, YuQi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen

To date, most existing datasets focus on autonomous driving scenes.

Autonomous Driving Domain Adaptation +3

Paper
Code

FakeMix Augmentation Improves Transparent Object Detection

1 code implementation • 24 Mar 2021 • Yang Cao, Zhengqiang Zhang, Enze Xie, Qibin Hou, Kai Zhao, Xiangui Luo, Jian Tuo

However, these methods usually encounter boundary-related imbalance problem, leading to limited generation capability.

Data Augmentation Object +3

Paper
Code

AutoSpace: Neural Architecture Search with Less Human Interference

1 code implementation • ICCV 2021 • Daquan Zhou, Xiaojie Jin, Xiaochen Lian, Linjie Yang, Yujing Xue, Qibin Hou, Jiashi Feng

Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.

Neural Architecture Search

Paper
Code

Three Birds One Stone: A General Architecture for Salient Object Segmentation, Edge Detection and Skeleton Extraction

no code implementations • 27 Mar 2018 • Qibin Hou, Jiang-Jiang Liu, Ming-Ming Cheng, Ali Borji, Philip H. S. Torr

Although these tasks are inherently very different, we show that our unified approach performs very well on all of them and works far better than current single-purpose state-of-the-art methods.

Edge Detection Semantic Segmentation

Paper
Add Code

WebSeg: Learning Semantic Segmentation from Web Searches

no code implementations • 27 Mar 2018 • Qibin Hou, Ming-Ming Cheng, Jiang-Jiang Liu, Philip H. S. Torr

In this paper, we improve semantic segmentation by automatically learning from Flickr images associated with a particular keyword, without relying on any explicit user annotations, thus substantially alleviating the dependence on accurate annotations when compared to previous weakly supervised methods.

Segmentation Semantic Segmentation

Paper
Add Code

Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground

no code implementations • ECCV 2018 • Deng-Ping Fan, Ming-Ming Cheng, Jiang-Jiang Liu, Shang-Hua Gao, Qibin Hou, Ali Borji

Our analysis identifies a serious design bias of existing SOD datasets which assumes that each image contains at least one clearly outstanding salient object in low clutter.

Attribute Object +3

Paper
Add Code

FLIC: Fast Linear Iterative Clustering with Active Search

no code implementations • 6 Dec 2016 • Jia-Xing Zhao, Ren Bo, Qibin Hou, Ming-Ming Cheng, Paul L. Rosin

It also has drawbacks on convergence rate as a result of both the fixed search region and separately doing the assignment step and the update step.

Clustering Segmentation

Paper
Add Code

Salient Object Detection: A Survey

no code implementations • 18 Nov 2014 • Ali Borji, Ming-Ming Cheng, Qibin Hou, Huaizu Jiang, Jia Li

Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision.

Object object-detection +4

Paper
Add Code

Self-Erasing Network for Integral Object Attention

no code implementations • NeurIPS 2018 • Qibin Hou, Peng-Tao Jiang, Yunchao Wei, Ming-Ming Cheng

To test the quality of the generated attention maps, we employ the mined object regions as heuristic cues for learning semantic segmentation models.

Object Semantic Segmentation

Paper
Add Code

Associating Inter-Image Salient Instances for Weakly Supervised Semantic Segmentation

no code implementations • ECCV 2018 • Ruochen Fan, Qibin Hou, Ming-Ming Cheng, Gang Yu, Ralph R. Martin, Shi-Min Hu

We also combine our method with Mask R-CNN for instance segmentation, and demonstrated for the first time the ability of weakly supervised instance segmentation using only keyword annotations.

Ranked #4 on Image-level Supervised Instance Segmentation on COCO test-dev (using extra training data)

Clustering graph partitioning +6

Paper
Add Code

Neural Epitome Search for Architecture-Agnostic Network Compression

no code implementations • ICLR 2020 • Daquan Zhou, Xiaojie Jin, Qibin Hou, Kaixin Wang, Jianchao Yang, Jiashi Feng

The recent WSNet [1] is a new model compression method through sampling filterweights from a compact set and has demonstrated to be effective for 1D convolutionneural networks (CNNs).

Model Compression Neural Architecture Search

Paper
Add Code

Cross-layer Feature Pyramid Network for Salient Object Detection

no code implementations • 25 Feb 2020 • Zun Li, Congyan Lang, Junhao Liew, Qibin Hou, Yidong Li, Jiashi Feng

Feature pyramid network (FPN) based models, which fuse the semantics and salient details in a progressive manner, have been proven highly effective in salient object detection.

Object object-detection +2

Paper
Add Code

Semantic Domain Adversarial Networks for Unsupervised Domain Adaptation

no code implementations • 30 Mar 2020 • Dapeng Hu, Jian Liang, Qibin Hou, Hanshu Yan, Yunpeng Chen, Shuicheng Yan, Jiashi Feng

To successfully align the multi-modal data structures across domains, the following works exploit discriminative information in the adversarial training process, e. g., using multiple class-wise discriminators and introducing conditional information in input or output of the domain discriminator.

Object Recognition Semantic Segmentation +1

Paper
Add Code

Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton

no code implementations • 18 Apr 2020 • Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng

To evaluate the performance of our proposed network on these tasks, we conduct exhaustive experiments on multiple representative datasets.

Edge Detection Semantic Segmentation

Paper
Add Code

Multi-Miner: Object-Adaptive Region Mining for Weakly-Supervised Semantic Segmentation

no code implementations • 14 Jun 2020 • Kuangqi Zhou, Qibin Hou, Zun Li, Jiashi Feng

In this paper, we propose a novel multi-miner framework to perform a region mining process that adapts to diverse object sizes and is thus able to mine more integral and finer object regions.

Object Segmentation +2

Paper
Add Code

PROTOTYPE-ASSISTED ADVERSARIAL LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION

no code implementations • 25 Sep 2019 • Dapeng Hu, Jian Liang*, Qibin Hou, Hanshu Yan, Jiashi Feng

Previous adversarial learning methods condition domain alignment only on pseudo labels, but noisy and inaccurate pseudo labels may perturb the multi-class distribution embedded in probabilistic predictions, hence bringing insufficient alleviation to the latent mismatch problem.

Object Recognition Semantic Segmentation +1

Paper
Add Code

Deep Negative Correlation Classification

no code implementations • 14 Dec 2022 • Le Zhang, Qibin Hou, Yun Liu, Jia-Wang Bian, Xun Xu, Joey Tianyi Zhou, Ce Zhu

Ensemble learning serves as a straightforward way to improve the performance of almost any machine learning algorithm.

Classification Ensemble Learning

Paper
Add Code

CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition

no code implementations • 15 Jan 2023 • Cheng-Ze Lu, Xiaojie Jin, Zhicheng Huang, Qibin Hou, Ming-Ming Cheng, Jiashi Feng

Contrastive Masked Autoencoder (CMAE), as a new self-supervised framework, has shown its potential of learning expressive feature representations in visual image recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Delving Deeper into Data Scaling in Masked Image Modeling

no code implementations • 24 May 2023 • Cheng-Ze Lu, Xiaojie Jin, Qibin Hou, Jun Hao Liew, Ming-Ming Cheng, Jiashi Feng

The study reveals that: 1) MIM can be viewed as an effective method to improve the model capacity when the scale of the training data is relatively small; 2) Strong reconstruction targets can endow the models with increased capacities on downstream tasks; 3) MIM pre-training is data-agnostic under most scenarios, which means that the strategy of sampling pre-training data is non-critical.

Self-Supervised Learning

Paper
Add Code

MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask

no code implementations • 8 Sep 2023 • Yupeng Zhou, Daquan Zhou, Zuo-Liang Zhu, Yaxing Wang, Qibin Hou, Jiashi Feng

In this work, we identify that a crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning between the prompt and the output image.

Paper
Add Code

ChatAnything: Facetime Chat with LLM-Enhanced Personas

no code implementations • 12 Nov 2023 • Yilin Zhao, Xinbin Yuan, ShangHua Gao, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou

For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones and select the most matching one based on the user-provided text description automatically.

In-Context Learning Novel Concepts +2

Paper
Add Code

TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes

no code implementations • 7 Dec 2023 • Xuying Zhang, Bo-Wen Yin, Yuming Chen, Zheng Lin, Yunheng Li, Qibin Hou, Ming-Ming Cheng

Particularly, a cross-modal graph is constructed to align the object points accurately and noun phrases decoupled from the 3D mesh and textual description.

Graph Attention Object

Paper
Add Code

Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement

no code implementations • 14 Feb 2024 • Huachen Fang, Jinjian Wu, Qibin Hou, Weisheng Dong, Guangming Shi

Previous deep learning-based event denoising methods mostly suffer from poor interpretability and difficulty in real-time processing due to their complex architecture designs.

Denoising

Paper
Add Code

Sora Generates Videos with Stunning Geometrical Consistency

no code implementations • 27 Feb 2024 • XuanYi Li, Daquan Zhou, Chenxu Zhang, Shaodong Wei, Qibin Hou, Ming-Ming Cheng

We employ a method that transforms the generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily contingent on the video quality.

3D Reconstruction Video Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.