Search Results for author: Weiming Hu

Found 80 papers, 25 papers with code

Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training

1 code implementation18 Apr 2024 Jin Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu

In this paper, we question if the extremely simple ViTs' fine-tuning performance with a small-scale architecture can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-established lightweight architecture design methodology with sophisticated components introduced.

Contrastive Learning Image Classification +2

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

no code implementations22 Mar 2024 Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao

Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation.

3D Generation

BEV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

no code implementations11 Mar 2024 Fudong Ge, Yiwei Zhang, Shuhan Shen, Yue Wang, Weiming Hu, Jin Gao

To tackle the above issues, we design a new BEV-enhanced VPR framework, nemely BEV2PR, which can generate a composite descriptor with both visual cues and spatial awareness solely based on a single camera.

Visual Place Recognition

PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts

no code implementations8 Mar 2024 Zewen Chen, Haina Qin, Juan Wang, Chunfeng Yuan, Bing Li, Weiming Hu, Liang Wang

On the other hand, PromptIQA is trained on a mixed dataset with two proposed data augmentation strategies to learn diverse requirements, thus enabling it to effectively adapt to new requirements.

Data Augmentation No-Reference Image Quality Assessment

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

no code implementations1 Mar 2024 Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment.

Representation Learning

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

no code implementations26 Feb 2024 Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and combines the strengths of latent and lexicon representations for video-text retrieval.

Retrieval Text Retrieval +1

NFT1000: A Visual Text Dataset For Non-Fungible Token Retrieval

no code implementations29 Jan 2024 Shuxun Wang, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu, Li Yang, Wenjuan Li, Bing Li, Weiming Hu

With the rise of 'Metaverse' and 'Web3. 0', NFT ( Non-Fungible Token ) has emerged as a kind of pivotal digital asset, garnering significant attention.

Retrieval

GMC-IQA: Exploiting Global-correlation and Mean-opinion Consistency for No-reference Image Quality Assessment

no code implementations19 Jan 2024 Zewen Chen, Juan Wang, Bing Li, Chunfeng Yuan, Weiming Hu, Junxian Liu, Peng Li, Yan Wang, Youqun Zhang, Congxuan Zhang

Due to the subjective nature of image quality assessment (IQA), assessing which image has better quality among a sequence of images is more reliable than assigning an absolute mean opinion score for an image.

No-Reference Image Quality Assessment

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

no code implementations27 Dec 2023 Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, ZhengJun Zha, Haibin Huang, Chongyang Ma

I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model.

Video Generation

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

no code implementations25 Dec 2023 Yifan Lu, Ziqi Zhang, Chunfeng Yuan, Peng Li, Yan Wang, Bing Li, Weiming Hu

Each caption in the set is attached to a concept combination indicating the primary semantic content of the caption and facilitating element alignment in set prediction.

Caption Generation Video Captioning

ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual Tracking

1 code implementation NeurIPS 2023 Yutong Kou, Jin Gao, Bing Li, Gang Wang, Weiming Hu, Yizheng Wang, Liang Li

To this end, we non-uniformly resize the cropped image to have a smaller input size while the resolution of the area where the target is more likely to appear is higher and vice versa.

Visual Tracking

AUNet: Learning Relations Between Action Units for Face Forgery Detection

no code implementations CVPR 2023 Weiming Bai, Yufan Liu, Zhipeng Zhang, Bing Li, Weiming Hu

Observing that face manipulation may alter the relation between different facial action units (AU), we propose the Action Units Relation Learning framework to improve the generality of forgery detection.

DeepFake Detection Face Swapping +1

ViLEM: Visual-Language Error Modeling for Image-Text Retrieval

no code implementations CVPR 2023 Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Weiming Hu, XiaoHu Qie, Jianping Wu

ViLEM then enforces the model to discriminate the correctness of each word in the plausible negative texts and further correct the wrong words via resorting to image information.

Contrastive Learning Retrieval +3

DSPNet: Towards Slimmable Pretrained Networks based on Discriminative Self-supervised Learning

no code implementations13 Jul 2022 Shaoru Wang, Zeming Li, Jin Gao, Liang Li, Weiming Hu

However, when facing various resource budgets in real-world applications, it costs a huge computation burden to pretrain multiple networks of various sizes one by one.

Knowledge Distillation Self-Supervised Learning

Cross-Architecture Knowledge Distillation

no code implementations12 Jul 2022 Yufan Liu, Jiajiong Cao, Bing Li, Weiming Hu, Jingting Ding, Liang Li

However, most existing knowledge distillation methods only consider homologous-architecture distillation, such as distilling knowledge from CNN to CNN.

Knowledge Distillation

SiamMask: A Framework for Fast Online Object Tracking and Segmentation

no code implementations5 Jul 2022 Weiming Hu, Qiang Wang, Li Zhang, Luca Bertinetto, Philip H. S. Torr

In this paper we introduce SiamMask, a framework to perform both visual object tracking and video object segmentation, in real-time, with the same simple method.

Multiple Object Tracking Object +5

Narrowing the Gap: Improved Detector Training with Noisy Location Annotations

1 code implementation12 Jun 2022 Shaoru Wang, Jin Gao, Bing Li, Weiming Hu

Experiments for both synthesized and real-world scenarios consistently demonstrate the effectiveness of our approach, e. g., our method increases the degraded performance of the FCOS detector from 33. 6% AP to 35. 6% AP on COCO.

object-detection Object Detection

A Closer Look at Self-Supervised Lightweight Vision Transformers

2 code implementations28 May 2022 Shaoru Wang, Jin Gao, Zeming Li, Xiaoqin Zhang, Weiming Hu

We also point out some defects of such pre-training, e. g., failing to benefit from large-scale pre-training data and showing inferior performance on data-insufficient downstream tasks.

Contrastive Learning Image Classification +1

A Comparative Study of Gastric Histopathology Sub-size Image Classification: from Linear Regression to Visual Transformer

no code implementations25 May 2022 Weiming Hu, HaoYuan Chen, Wanli Liu, Xiaoyan Li, Hongzan Sun, Xinyu Huang, Marcin Grzegorzek, Chen Li

Ensemble learning is a way to improve the accuracy of algorithms, and finding multiple learning models with complementarity types is the basis of ensemble learning.

BIG-bench Machine Learning Ensemble Learning +2

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

1 code implementation CVPR 2022 Li Yang, Yan Xu, Chunfeng Yuan, Wei Liu, Bing Li, Weiming Hu

They base the visual grounding on the features from pre-generated proposals or anchors, and fuse these features with the text embeddings to locate the target mentioned by the text.

Attribute object-detection +2

CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation

no code implementations31 Mar 2022 Ziqi Zhang, Yuxin Chen, Zongyang Ma, Zhongang Qi, Chunfeng Yuan, Bing Li, Ying Shan, Weiming Hu

In this paper, we propose to CREATE, the first large-scale Chinese shoRt vidEo retrievAl and Title gEneration benchmark, to facilitate research and application in video titling and video retrieval in Chinese.

Retrieval Video Captioning +1

Learning Target-aware Representation for Visual Tracking via Informative Interactions

no code implementations7 Jan 2022 Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, Weiming Hu

The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks.

Representation Learning Visual Tracking

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

2 code implementations CVPR 2020 Jin Gao, Yan Lu, Xiaojuan Qi, Yutong Kou, Bing Li, Liang Li, Shan Yu, Weiming Hu

In this paper, we propose a simple yet effective recursive least-squares estimator-aided online learning approach for few-shot online adaptation without requiring offline training.

Continual Learning One-Shot Learning +1

Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos

1 code implementation5 Nov 2021 Minglang Qiao, Yufan Liu, Mai Xu, Xin Deng, Bing Li, Weiming Hu, Ali Borji

In this paper, we propose a multitask learning method for visual-audio saliency prediction and sound source localization on multi-face video by leveraging visual, audio and face information.

Saliency Prediction

SDTP: Semantic-aware Decoupled Transformer Pyramid for Dense Image Prediction

no code implementations18 Sep 2021 Zekun Li, Yufan Liu, Bing Li, Weiming Hu, Kebin Wu, Pei Wang

CDI builds the global attention and interaction among different levels in decoupled space which also solves the problem of heavy computation.

Differentiable Convolution Search for Point Cloud Processing

no code implementations ICCV 2021 Xing Nie, Yongcheng Liu, Shaohong Chen, Jianlong Chang, Chunlei Huo, Gaofeng Meng, Qi Tian, Weiming Hu, Chunhong Pan

It can work in a purely data-driven manner and thus is capable of auto-creating a group of suitable convolutions for geometric shape modeling.

Learn to Match: Automatic Matching Network Design for Visual Tracking

1 code implementation ICCV 2021 Zhipeng Zhang, Yihao Liu, Xiao Wang, Bing Li, Weiming Hu

Siamese tracking has achieved groundbreaking performance in recent years, where the essence is the efficient matching operator cross-correlation and its variants.

Visual Tracking

GasHisSDB: A New Gastric Histopathology Image Dataset for Computer Aided Diagnosis of Gastric Cancer

1 code implementation4 Jun 2021 Weiming Hu, Chen Li, Xiaoyan Li, Md Mamunur Rahaman, Jiquan Ma, Yong Zhang, HaoYuan Chen, Wanli Liu, Changhao Sun, YuDong Yao, Hongzan Sun, Marcin Grzegorzek

In order to prove that the methods of different periods in the field of image classification have discrepancies on GasHisSDB, we select a variety of classifiers for evaluation.

BIG-bench Machine Learning Image Classification +1

A Comparison for Anti-noise Robustness of Deep Learning Classification Methods on a Tiny Object Image Dataset: from Convolutional Neural Network to Visual Transformer and Performer

no code implementations3 Jun 2021 Ao Chen, Chen Li, HaoYuan Chen, Hechen Yang, Peng Zhao, Weiming Hu, Wanli Liu, Shuojia Zou, Marcin Grzegorzek

In this paper, we first briefly review the development of Convolutional Neural Network and Visual Transformer in deep learning, and introduce the sources and development of conventional noises and adversarial attacks.

Classification Image Classification

A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking

no code implementations6 May 2021 Zhenbang Li, Yaya Shi, Jin Gao, Shaoru Wang, Bing Li, Pengpeng Liang, Weiming Hu

In this paper, we show the existence of universal perturbations that can enable the targeted attack, e. g., forcing a tracker to follow the ground-truth trajectory with specified offsets, to be video-agnostic and free from inference in a network.

Visual Tracking

GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathological Image Detection

no code implementations29 Apr 2021 HaoYuan Chen, Chen Li, Ge Wang, Xiaoyan Li, Md Rahaman, Hongzan Sun, Weiming Hu, Yixin Li, Wanli Liu, Changhao Sun, Shiliang Ai, Marcin Grzegorzek

In this paper, a multi-scale visual transformer model, referred as GasHis-Transformer, is proposed for Gastric Histopathological Image Detection (GHID), which enables the automatic global detection of gastric cancer images.

Adversarial Attack General Classification +3

PDNet: Toward Better One-Stage Object Detection With Prediction Decoupling

1 code implementation28 Apr 2021 Li Yang, Yan Xu, Shaoru Wang, Chunfeng Yuan, Ziqi Zhang, Bing Li, Weiming Hu

However, the most suitable positions for inferring different targets, i. e., the object category and boundaries, are generally different.

Object object-detection +1

Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model

1 code implementation ECCV 2020 Yufan Liu, Minglang Qiao, Mai Xu, Bing Li, Weiming Hu, Ali Borji

Inspired by the findings of our investigation, we propose a novel multi-modal video saliency model consisting of three branches: visual, audio and face.

Saliency Prediction

Open-book Video Captioning with Retrieve-Copy-Generate Network

no code implementations CVPR 2021 Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Ying Deng, Weiming Hu

Due to the rapid emergence of short videos and the requirement for content understanding and creation, the video captioning task has received increasing attention in recent years.

Retrieval Video Captioning

Weather Analogs with a Machine Learning Similarity Metric for Renewable Resource Forecasting

no code implementations8 Mar 2021 Weiming Hu, Guido Cervone, George Young, Luca Delle Monache

The central core of the AnEn technique is a similarity metric that sorts historical forecasts with respect to a new target prediction.

BIG-bench Machine Learning feature selection

Using Long Short-Term Memory (LSTM) and Internet of Things (IoT) for localized surface temperature forecasting in an urban environment

no code implementations4 Feb 2021 Manzhu Yu, Fangcao Xu, Weiming Hu, Jian Sun, Guido Cervone

Meanwhile, by using IoT observations, the spatial resolution of air temperature predictions is significantly improved.

Rethinking the competition between detection and ReID in Multi-Object Tracking

4 code implementations23 Oct 2020 Chao Liang, Zhipeng Zhang, Xue Zhou, Bing Li, Shuyuan Zhu, Weiming Hu

However, the inherent differences and relations between detection and re-identification (ReID) are unconsciously overlooked because of treating them as two isolated tasks in the one-shot tracking paradigm.

 Ranked #1 on Multi-Object Tracking on HiEve (using extra training data)

Multi-Object Tracking

Towards Accurate Pixel-wise Object Tracking by Attention Retrieval

1 code implementation6 Aug 2020 Zhipeng Zhang, Bing Li, Weiming Hu, Houwen Peng

We first build a look-up-table (LUT) with the ground-truth mask in the starting frame, and then retrieves the LUT to obtain an attention map for spatial constraints.

Object Object Tracking +2

Object Relational Graph with Teacher-Recommended Learning for Video Captioning

no code implementations CVPR 2020 Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zheng-Jun Zha

In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy.

Ranked #9 on Video Captioning on VATEX (using extra training data)

Language Modelling Video Captioning

RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation

1 code implementation11 Dec 2019 Shaoru Wang, Yongchao Gong, Junliang Xing, Lichao Huang, Chang Huang, Weiming Hu

To reciprocate these two tasks, we design a two-stream structure to learn features on both the object level (i. e., bounding boxes) and the pixel level (i. e., instance masks) jointly.

Instance Segmentation Object +5

Probabilistic Forecasting using Deep Generative Models

no code implementations26 Sep 2019 Alessandro Fanfarillo, Behrooz Roozitalab, Weiming Hu, Guido Cervone

In order to provide a meaningful probabilistic forecast, the AnEn method requires storing a historical set of past predictions and observations in memory for a period of at least several months and spanning the seasons relevant for the prediction of interest.

Multimodal Semantic Attention Network for Video Captioning

no code implementations8 May 2019 Liang Sun, Bing Li, Chunfeng Yuan, Zheng-Jun Zha, Weiming Hu

Inspired by the fact that different modalities in videos carry complementary information, we propose a Multimodal Semantic Attention Network(MSAN), which is a new encoder-decoder framework incorporating multimodal semantic attributes for video captioning.

Attribute General Classification +2

Fast Online Object Tracking and Segmentation: A Unifying Approach

3 code implementations CVPR 2019 Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, Philip H. S. Torr

In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach.

Object Real-Time Visual Tracking +4

Visual Tracking via Spatially Aligned Correlation Filters Network

no code implementations ECCV 2018 Mengdan Zhang, Qiang Wang, Junliang Xing, Jin Gao, Peixi Peng, Weiming Hu, Steve Maybank

Correlation filters based trackers rely on a periodic assumption of the search sample to efficiently distinguish the target from the background.

Visual Tracking

Distractor-aware Siamese Networks for Visual Object Tracking

1 code implementation ECCV 2018 Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, Weiming Hu

During the off-line training phase, an effective sampling strategy is introduced to control this distribution and make the model focus on the semantic distractors.

Incremental Learning Object +2

Interaction-aware Spatio-temporal Pyramid Attention Networks for Action Classification

no code implementations ECCV 2018 Yang Du, Chunfeng Yuan, Bing Li, Lili Zhao, Yangxi Li, Weiming Hu

Furthermore, since different layers in a deep network capture feature maps of different scales, we use these feature maps to construct a spatial pyramid and then utilize multi-scale information to obtain more accurate attention scores, which are used to weight the local features in all spatial positions of feature maps to calculate attention maps.

Action Classification Classification +1

Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking

2 code implementations CVPR 2018 Qiang Wang, Zhu Teng, Junliang Xing, Jin Gao, Weiming Hu, Stephen Maybank

The RASNet model reformulates the correlation filter within a Siamese tracking framework, and introduces different kinds of the attention mechanisms to adapt the model without updating the model online.

Object Tracking Representation Learning +1

Deep Cost-Sensitive and Order-Preserving Feature Learning for Cross-Population Age Estimation

no code implementations CVPR 2018 Kai Li, Junliang Xing, Chi Su, Weiming Hu, Yundong Zhang, Stephen Maybank

First, a novel cost-sensitive multi-task loss function is designed to learn transferable aging features by training on the source population.

Age Estimation

Spatio-Temporal Self-Organizing Map Deep Network for Dynamic Object Detection From Videos

no code implementations CVPR 2017 Yang Du, Chunfeng Yuan, Bing Li, Weiming Hu, Stephen Maybank

In dynamic object detection, it is challenging to construct an effective model to sufficiently characterize the spatial-temporal properties of the background.

object-detection Object Detection

DCFNet: Discriminant Correlation Filters Network for Visual Tracking

5 code implementations13 Apr 2017 Qiang Wang, Jin Gao, Junliang Xing, Mengdan Zhang, Weiming Hu

In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously.

Object Tracking Visual Tracking

Tensor Power Iteration for Multi-Graph Matching

no code implementations CVPR 2016 Xinchu Shi, Haibin Ling, Weiming Hu, Junliang Xing, Yanning Zhang

Due to its wide range of applications, matching between two graphs has been extensively studied and remains an active topic.

Graph Matching

Local Subspace Collaborative Tracking

no code implementations ICCV 2015 Lin Ma, Xiaoqin Zhang, Weiming Hu, Junliang Xing, Jiwen Lu, Jie zhou

To address this, this paper presents a local subspace collaborative tracking method for robust visual tracking, where multiple linear and nonlinear subspaces are learned to better model the nonlinear relationship of object appearances.

Object Object Tracking +1

Multi-target Tracking with Motion Context in Tensor Power Iteration

no code implementations CVPR 2014 Xinchu Shi, Haibin Ling, Weiming Hu, Chunfeng Yuan, Junliang Xing

In this paper, we model interactions between neighbor targets by pair-wise motion context, and further encode such context into the global association optimization.

Towards Multi-view and Partially-Occluded Face Alignment

no code implementations CVPR 2014 Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan

During each training stage, the SRD model learns a relational dictionary to capture consistent relationships between face appearance and shape, which are respectively modeled by the pose-indexed image features and the shape displacements for current estimated landmarks.

Face Alignment

Multi-target Tracking by Rank-1 Tensor Approximation

no code implementations CVPR 2013 Xinchu Shi, Haibin Ling, Junling Xing, Weiming Hu

In this paper we formulate multi-target tracking (MTT) as a rank-1 tensor approximation problem and propose an 1 norm tensor power iteration solution.

Illumination Estimation Based on Bilayer Sparse Coding

no code implementations CVPR 2013 Bing Li, Weihua Xiong, Weiming Hu, Houwen Peng

In this paper, we propose a novel bilayer sparse coding model for illumination estimation that considers image similarity in terms of both low level color distribution and high level image scene content simultaneously.

Color Constancy

Multi-task Sparse Learning with Beta Process Prior for Action Recognition

no code implementations CVPR 2013 Chunfeng Yuan, Weiming Hu, Guodong Tian, Shuang Yang, Haoran Wang

In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple features from as few bases as possible.

Action Recognition Sparse Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.