no code implementations • 22 May 2025 • Shuhao Han, Haotian Fan, Fangyuan Kong, Wenjie Liao, Chunle Guo, Chongyi Li, Radu Timofte, Liang Li, Tao Li, Junhui Cui, Yunqiu Wang, Yang Tai, Jingwei Sun, Jianhui Sun, Xinli Yue, Tianyi Wang, Huan Hou, Junda Lu, Xinyang Huang, Zitang Zhou, Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao, Trong-Hieu Nguyen-Mau, Minh-Hoang Le, Minh-Khoa Le-Phan, Duy-Nam Ly, Hai-Dang Nguyen, Minh-Triet Tran, Yukang Lin, Yan Hong, Chuanbiao Song, Siyuan Li, Jun Lan, Zhichao Zhang, Xinyue Li, Wei Sun, ZiCheng Zhang, Yunhao Li, Xiaohong Liu, Guangtao Zhai, Zitong Xu, Huiyu Duan, Jiarui Wang, Guangji Ma, Liu Yang, Lu Liu, Qiang Hu, Xiongkuo Min, Zichuan Wang, Zhenchen Tang, Bo Peng, Jing Dong, Fengbin Guan, Zihao Yu, Yiting Lu, Wei Luo, Xin Li, Minhao Lin, Haofeng Chen, Xuanxuan He, Kele Xu, Qisheng Xu, Zijian Gao, Tianjiao Wan, Bo-Cheng Qiu, Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Bo Yu, Zehao Wang, Da Mu, Mingxiu Chen, Junkang Fang, Huamei Sun, Wending Zhao, Zhiyu Wang, Wang Liu, Weikang Yu, Puhong Duan, Bin Sun, Xudong Kang, Shutao Li, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Jiarong He, Zhishan Qiao, Yongqing Huang, Zewen Chen, Zhe Pang, Juan Wang, Jian Guo, Zhizhuo Shao, Ziyu Feng, Bing Li, Weiming Hu, Hesong Li, Dehua Liu, Zeming Liu, Qingsong Xie, Ruichen Wang, Zhihao LI, Yuqi Liang, Jianqi Bi, Jun Luo, Junfeng Yang, Can Li, Jing Fu, Hongwei Xu, Mingrui Long, Lulin Tang
A total of 211 participants have registered in the structure track.
1 code implementation • 21 Feb 2025 • Yueting Liu, Hanshi Wang, Yunfei Lei, ZhengJun Zha, Weiming Hu, Jin Gao
We attribute this shortcoming to the scarcity of high-quality datasets in semi-structured scenes, particularly concerning pedestrian perception and prediction.
1 code implementation • 20 Feb 2025 • Haowei Liu, Xi Zhang, Haiyang Xu, Yuyang Wanyan, Junyang Wang, Ming Yan, Ji Zhang, Chunfeng Yuan, Changsheng Xu, Weiming Hu, Fei Huang
From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture that decomposes decision-making processes into Instruction-Subtask-Action levels.
no code implementations • CVPR 2025 • Haina Qin, Wenyang Luo, Libin Wang, Dandan Zheng, Jingdong Chen, Ming Yang, Bing Li, Weiming Hu
Image restoration aims to recover high-quality (HQ) images from degraded low-quality (LQ) ones by reversing the effects of degradation.
no code implementations • CVPR 2025 • Wenyang Luo, Haina Qin, Zewen Chen, Libin Wang, Dandan Zheng, Yuming Li, Yufan Liu, Bing Li, Weiming Hu
Image restoration tasks like deblurring, denoising, and dehazing usually need distinct models for each degradation type, restricting their generalization in real-world scenarios with mixed or unknown degradations.
no code implementations • 22 Nov 2024 • Tao Zhang, Ziqi Zhang, Zongyang Ma, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Yuxuan Zhao, Zehua Xie, Jin Ma, Ying Shan, Weiming Hu
Thus, multimodal Retrieval-Augmented Generation (mRAG) is naturally introduced to provide MLLMs with comprehensive and up-to-date knowledge, effectively expanding the knowledge scope.
1 code implementation • 15 Nov 2024 • Zewen Chen, Juan Wang, Wen Wang, Sunhan Xu, Hang Xiong, Yun Zeng, Jian Guo, Shuxun Wang, Chunfeng Yuan, Bing Li, Weiming Hu
The quality analysis of ROIs can provide fine-grained guidance for image quality improvement and is crucial for scenarios focusing on region-level quality.
no code implementations • 11 Nov 2024 • Shubo Lin, Yutong Kou, Zirui Wu, Shaoru Wang, Bing Li, Weiming Hu, Jin Gao
Specifically, we propose a Task-specific Hybrid Matching module for a weight-shared cross-attention-based decoder that matches the targets of track queries with multiple object queries to exploit promising candidates overlooked by the self-attention mechanism.
1 code implementation • 3 Nov 2024 • Yiwei Zhang, Jin Gao, Fudong Ge, Guan Luo, Bing Li, Zhaoxiang Zhang, Haibin Ling, Weiming Hu
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car to make the results coherent and realistic.
no code implementations • 27 Sep 2024 • Jinming Lou, Wenyang Luo, Yufan Liu, Bing Li, Xinmiao Ding, Weiming Hu, Jiajiong Cao, Yuming Li, Chenguang Ma
Diffusion transformers have gained substantial interest in diffusion generative modeling due to their outstanding performance.
1 code implementation • 2 Sep 2024 • Zewen Chen, Sunhan Xu, Yun Zeng, Haochen Guo, Jian Guo, Shuai Liu, Juan Wang, Bing Li, Weiming Hu, Dehua Liu, Hesong Li
With the rising demand for high-resolution (HR) images, No-Reference Image Quality Assessment (NR-IQA) gains more attention, as it can ecaluate image quality in real-time on mobile devices and enhance user experience.
no code implementations • 10 Aug 2024 • Jiang Yuan, Ji Ma, Bo wang, Weiming Hu
Implicit degradation modeling-based blind super-resolution (SR) has attracted more increasing attention in the community due to its excellent generalization to complex degradation scenarios and wide application range.
1 code implementation • 22 Jul 2024 • Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, Junping Zhao, Ke Zhang, Minyi Guo, Jingwen Leng
This study introduces the vTensor, an innovative tensor structure for LLM inference based on GPU virtual memory management (VMM).
no code implementations • 21 Jul 2024 • Haowei Liu, Xi Zhang, Haiyang Xu, Yaya Shi, Chaoya Jiang, Ming Yan, Ji Zhang, Fei Huang, Chunfeng Yuan, Bing Li, Weiming Hu
However, most existing MLLMs and benchmarks primarily focus on single-image input scenarios, leaving the performance of MLLMs when handling realistic multiple images underexplored.
1 code implementation • 19 Jul 2024 • Yunfei Zhang, Chao Liang, Jin Gao, Zhipeng Zhang, Weiming Hu, Stephen Maybank, Xue Zhou, Liang Li
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks by incorporating the extraction of appearance features as auxiliary tasks through embedding Re-Identification task (ReID) into the detector, achieving a balance between inference speed and tracking performance.
no code implementations • 16 Jul 2024 • Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao
The core idea is two-fold: 1) We propose a novel multi-view video diffusion model (MV-VDM) conditioned on multi-view renderings of the static 3D object, which is trained on our presented large-scale multi-view video dataset (MV-Video).
no code implementations • 10 Jul 2024 • Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu
EA-VTR can efficiently encode frame-level and video-level visual representations simultaneously, enabling detailed event content and complex event temporal cross-modal alignment, ultimately enhancing the comprehensive understanding of video events.
no code implementations • CVPR 2024 • Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Ying Shan, Xiaojuan Qi, Weiming Hu
Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency.
1 code implementation • 26 Jun 2024 • Yice Zhang, Jie Zeng, Weiming Hu, Ziyi Wang, Shiwei Chen, Ruifeng Xu
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis.
1 code implementation • 18 Apr 2024 • Jin Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu
In this paper, we question if the \textit{extremely simple} lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-established lightweight architecture design methodology.
no code implementations • 22 Mar 2024 • Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao
Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation.
1 code implementation • 11 Mar 2024 • Fudong Ge, Yiwei Zhang, Shuhan Shen, Yue Wang, Weiming Hu, Jin Gao
2) The lower layers of the pre-trained backbone from BEV generation are shared for visual and structural streams in VPR, facilitating the learning of fine-grained local features in the visual stream.
1 code implementation • 8 Mar 2024 • Zewen Chen, Haina Qin, Juan Wang, Chunfeng Yuan, Bing Li, Weiming Hu, Liang Wang
On the other hand, PromptIQA is trained on a mixed dataset with two proposed data augmentation strategies to learn diverse requirements, thus enabling it to effectively adapt to new requirements.
no code implementations • 1 Mar 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu
In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment.
no code implementations • 26 Feb 2024 • Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu
In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and combines the strengths of latent and lexicon representations for video-text retrieval.
1 code implementation • 29 Jan 2024 • Shuxun Wang, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu, Li Yang, Wenjuan Li, Bing Li, Weiming Hu
In this paper, we will introduce a benchmark dataset named "NFT Top1000 Visual-Text Dataset" (NFT1000), containing 7. 56 million image-text pairs, and being collected from 1000 most famous PFP1 NFT collections2 by sales volume on the Ethereum blockchain.
no code implementations • 19 Jan 2024 • Zewen Chen, Juan Wang, Bing Li, Chunfeng Yuan, Weiming Hu, Junxian Liu, Peng Li, Yan Wang, Youqun Zhang, Congxuan Zhang
Due to the subjective nature of image quality assessment (IQA), assessing which image has better quality among a sequence of images is more reliable than assigning an absolute mean opinion score for an image.
no code implementations • CVPR 2024 • Hanshi Wang, Zhipeng Zhang, Jin Gao, Weiming Hu
Our motivation stems from the observation that 1) existing symmetric teacher-student methods for semi-supervised 3D object detection have characterized simplicity but impede the distillation performance between teacher and student because of the demand for an identical model structure and input data format.
2 code implementations • 27 Dec 2023 • Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, ZhengJun Zha, Haibin Huang, Chongyang Ma
I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model.
no code implementations • 25 Dec 2023 • Yifan Lu, Ziqi Zhang, Chunfeng Yuan, Peng Li, Yan Wang, Bing Li, Weiming Hu
Each caption in the set is attached to a concept combination indicating the primary semantic content of the caption and facilitating element alignment in set prediction.
1 code implementation • NeurIPS 2023 • Yutong Kou, Jin Gao, Bing Li, Gang Wang, Weiming Hu, Yizheng Wang, Liang Li
To this end, we non-uniformly resize the cropped image to have a smaller input size while the resolution of the area where the target is more likely to appear is higher and vice versa.
no code implementations • 21 Aug 2023 • Cheng Feng, Congxuan Zhang, Zhen Chen, Weiming Hu, Liyue Ge
Depth sensing is of paramount importance for unmanned aerial and autonomous vehicles.
no code implementations • CVPR 2023 • Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Weiming Hu, XiaoHu Qie, Jianping Wu
ViLEM then enforces the model to discriminate the correctness of each word in the plausible negative texts and further correct the wrong words via resorting to image information.
Ranked #46 on
Visual Reasoning
on Winoground
no code implementations • CVPR 2023 • Weiming Bai, Yufan Liu, Zhipeng Zhang, Bing Li, Weiming Hu
Observing that face manipulation may alter the relation between different facial action units (AU), we propose the Action Units Relation Learning framework to improve the generality of forgery detection.
no code implementations • CVPR 2023 • Haina Qin, Longfei Han, Weihua Xiong, Juan Wang, Wentao Ma, Bing Li, Weiming Hu
The ISP is less programmable and consists of a series of processing modules.
no code implementations • ICCV 2023 • Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Yingmin Luo, Zekun Li, Chunfeng Yuan, Bing Li, XiaoHu Qie, Ying Shan, Weiming Hu
This paper proposes a novel generative model, Order-Prompted Tag Sequence Generation (OP-TSG), according to the above characteristics.
no code implementations • 1 Dec 2022 • Liyu Shi, Xiaoyan Li, Weiming Hu, HaoYuan Chen, Jing Chen, Zizhen Fan, Minghe Gao, Yujie Jing, Guotao Lu, Deguo Ma, Zhiyu Ma, Qingtao Meng, Dechao Tang, Hongzan Sun, Marcin Grzegorzek, Shouliang Qi, Yueyang Teng, Chen Li
Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg).
no code implementations • 13 Jul 2022 • Shaoru Wang, Zeming Li, Jin Gao, Liang Li, Weiming Hu
However, when facing various resource budgets in real-world applications, it costs a huge computation burden to pretrain multiple networks of various sizes one by one.
no code implementations • 12 Jul 2022 • Yufan Liu, Jiajiong Cao, Bing Li, Weiming Hu, Jingting Ding, Liang Li
However, most existing knowledge distillation methods only consider homologous-architecture distillation, such as distilling knowledge from CNN to CNN.
no code implementations • 6 Jul 2022 • Yifan Lu, Ziqi Zhang, Yuxin Chen, Chunfeng Yuan, Bing Li, Weiming Hu
The task of Dense Video Captioning (DVC) aims to generate captions with timestamps for multiple events in one video.
1 code implementation • 5 Jul 2022 • Weiming Hu, Qiang Wang, Li Zhang, Luca Bertinetto, Philip H. S. Torr
In this paper we introduce SiamMask, a framework to perform both visual object tracking and video object segmentation, in real-time, with the same simple method.
1 code implementation • 30 Jun 2022 • Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming Hu, Yu-Gang Jiang
3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world.
Ranked #2 on
Robust Camera Only 3D Object Detection
on nuScenes-C
1 code implementation • 12 Jun 2022 • Shaoru Wang, Jin Gao, Bing Li, Weiming Hu
Experiments for both synthesized and real-world scenarios consistently demonstrate the effectiveness of our approach, e. g., our method increases the degraded performance of the FCOS detector from 33. 6% AP to 35. 6% AP on COCO.
no code implementations • 7 Jun 2022 • HaoYuan Chen, Chen Li, Xiaoyan Li, Md Mamunur Rahaman, Weiming Hu, Yixin Li, Wanli Liu, Changhao Sun, Hongzan Sun, Xinyu Huang, Marcin Grzegorzek
In addition, we conducted an ablation experiment and an interchangeability experiment to verify the ability and interchangeability of the three channels.
no code implementations • 2 Jun 2022 • Wanli Liu, Chen Li, Ning Xu, Tao Jiang, Md Mamunur Rahaman, Hongzan Sun, Xiangchen Wu, Weiming Hu, HaoYuan Chen, Changhao Sun, YuDong Yao, Marcin Grzegorzek
Cervical cytopathology image classification is an important method to diagnose cervical cancer.
2 code implementations • 28 May 2022 • Shaoru Wang, Jin Gao, Zeming Li, Xiaoqin Zhang, Weiming Hu
We also point out some defects of such pre-training, e. g., failing to benefit from large-scale pre-training data and showing inferior performance on data-insufficient downstream tasks.
no code implementations • 25 May 2022 • Weiming Hu, HaoYuan Chen, Wanli Liu, Xiaoyan Li, Hongzan Sun, Xinyu Huang, Marcin Grzegorzek, Chen Li
Ensemble learning is a way to improve the accuracy of algorithms, and finding multiple learning models with complementarity types is the basis of ensemble learning.
1 code implementation • CVPR 2022 • Li Yang, Yan Xu, Chunfeng Yuan, Wei Liu, Bing Li, Weiming Hu
They base the visual grounding on the features from pre-generated proposals or anchors, and fuse these features with the text embeddings to locate the target mentioned by the text.
no code implementations • 31 Mar 2022 • Ziqi Zhang, Yuxin Chen, Zongyang Ma, Zhongang Qi, Chunfeng Yuan, Bing Li, Ying Shan, Weiming Hu
In this paper, we propose to CREATE, the first large-scale Chinese shoRt vidEo retrievAl and Title gEneration benchmark, to facilitate research and application in video titling and video retrieval in Chinese.
1 code implementation • CVPR 2022 • Zongyang Ma, Guan Luo, Jin Gao, Liang Li, Yuxin Chen, Shaoru Wang, Congxuan Zhang, Weiming Hu
Open-vocabulary object detection aims to detect novel object categories beyond the training set.
Ranked #31 on
Open Vocabulary Object Detection
on MSCOCO
no code implementations • 17 Feb 2022 • Weiming Hu, Chen Li, Xiaoyan Li, Md Mamunur Rahaman, Yong Zhang, HaoYuan Chen, Wanli Liu, YuDong Yao, Hongzan Sun, Ning Xu, Xinyu Huang, Marcin Grzegorze
Traditional machine learning methods achieve maximum accuracy of 76. 02% and deep learning method achieves a maximum accuracy of 95. 37%.
no code implementations • 7 Jan 2022 • Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, Weiming Hu
The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks.
2 code implementations • CVPR 2020 • Jin Gao, Yan Lu, Xiaojuan Qi, Yutong Kou, Bing Li, Liang Li, Shan Yu, Weiming Hu
In this paper, we propose a simple yet effective recursive least-squares estimator-aided online learning approach for few-shot online adaptation without requiring offline training.
1 code implementation • CVPR 2022 • Yaya Shi, Xu Yang, Haiyang Xu, Chunfeng Yuan, Bing Li, Weiming Hu, Zheng-Jun Zha
The datasets will be released to facilitate the development of video captioning metrics.
1 code implementation • 5 Nov 2021 • Minglang Qiao, Yufan Liu, Mai Xu, Xin Deng, Bing Li, Weiming Hu, Ali Borji
In this paper, we propose a multitask learning method for visual-audio saliency prediction and sound source localization on multi-face video by leveraging visual, audio and face information.
no code implementations • 18 Sep 2021 • Zekun Li, Yufan Liu, Bing Li, Weiming Hu, Kebin Wu, Pei Wang
CDI builds the global attention and interaction among different levels in decoupled space which also solves the problem of heavy computation.
no code implementations • ICCV 2021 • Xing Nie, Yongcheng Liu, Shaohong Chen, Jianlong Chang, Chunlei Huo, Gaofeng Meng, Qi Tian, Weiming Hu, Chunhong Pan
It can work in a purely data-driven manner and thus is capable of auto-creating a group of suitable convolutions for geometric shape modeling.
1 code implementation • ICCV 2021 • Zhipeng Zhang, Yihao Liu, Xiao Wang, Bing Li, Weiming Hu
Siamese tracking has achieved groundbreaking performance in recent years, where the essence is the efficient matching operator cross-correlation and its variants.
2 code implementations • ICCV 2021 • Yuxin Chen, Ziqi Zhang, Chunfeng Yuan, Bing Li, Ying Deng, Weiming Hu
Graph convolutional networks (GCNs) have been widely used and achieved remarkable results in skeleton-based action recognition.
Ranked #13 on
Skeleton Based Action Recognition
on N-UCLA
1 code implementation • 4 Jun 2021 • Weiming Hu, Chen Li, Xiaoyan Li, Md Mamunur Rahaman, Jiquan Ma, Yong Zhang, HaoYuan Chen, Wanli Liu, Changhao Sun, YuDong Yao, Hongzan Sun, Marcin Grzegorzek
In order to prove that the methods of different periods in the field of image classification have discrepancies on GasHisSDB, we select a variety of classifiers for evaluation.
no code implementations • 3 Jun 2021 • Ao Chen, Chen Li, HaoYuan Chen, Hechen Yang, Peng Zhao, Weiming Hu, Wanli Liu, Shuojia Zou, Marcin Grzegorzek
In this paper, we first briefly review the development of Convolutional Neural Network and Visual Transformer in deep learning, and introduce the sources and development of conventional noises and adversarial attacks.
no code implementations • 16 May 2021 • Wanli Liu, Chen Li, Md Mamunur Rahamana, Tao Jiang, Hongzan Sun, Xiangchen Wu, Weiming Hu, HaoYuan Chen, Changhao Sun, YuDong Yao, Marcin Grzegorzek
The results of the study indicate that deep learning models are robust to changes in the aspect ratio of cells in cervical cytopathological images.
no code implementations • 6 May 2021 • Zhenbang Li, Yaya Shi, Jin Gao, Shaoru Wang, Bing Li, Pengpeng Liang, Weiming Hu
In this paper, we show the existence of universal perturbations that can enable the targeted attack, e. g., forcing a tracker to follow the ground-truth trajectory with specified offsets, to be video-agnostic and free from inference in a network.
no code implementations • 29 Apr 2021 • HaoYuan Chen, Chen Li, Ge Wang, Xiaoyan Li, Md Rahaman, Hongzan Sun, Weiming Hu, Yixin Li, Wanli Liu, Changhao Sun, Shiliang Ai, Marcin Grzegorzek
In this paper, a multi-scale visual transformer model, referred as GasHis-Transformer, is proposed for Gastric Histopathological Image Detection (GHID), which enables the automatic global detection of gastric cancer images.
1 code implementation • 28 Apr 2021 • Li Yang, Yan Xu, Shaoru Wang, Chunfeng Yuan, Ziqi Zhang, Bing Li, Weiming Hu
However, the most suitable positions for inferring different targets, i. e., the object category and boundaries, are generally different.
1 code implementation • 19 Apr 2021 • Chao Liang, Zhipeng Zhang, Xue Zhou, Bing Li, Weiming Hu
Eventually, it helps to reload the ``fake background'' and repair the broken tracklets.
no code implementations • 13 Apr 2021 • Xintong Li, Weiming Hu, Chen Li, Tao Jiang, Hongzan Sun, Xiaoyan Li, Xinyu Huang, Marcin Grzegorzek
Finally, the application prospect of the analytical method in this field is discussed.
1 code implementation • ECCV 2020 • Yufan Liu, Minglang Qiao, Mai Xu, Bing Li, Weiming Hu, Ali Borji
Inspired by the findings of our investigation, we propose a novel multi-modal video saliency model consisting of three branches: visual, audio and face.
no code implementations • CVPR 2021 • Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Ying Deng, Weiming Hu
Due to the rapid emergence of short videos and the requirement for content understanding and creation, the video captioning task has received increasing attention in recent years.
no code implementations • 8 Mar 2021 • Weiming Hu, Guido Cervone, George Young, Luca Delle Monache
The central core of the AnEn technique is a similarity metric that sorts historical forecasts with respect to a new target prediction.
no code implementations • 4 Feb 2021 • Manzhu Yu, Fangcao Xu, Weiming Hu, Jian Sun, Guido Cervone
Meanwhile, by using IoT observations, the spatial resolution of air temperature predictions is significantly improved.
no code implementations • 16 Nov 2020 • Zekun Li, Yufan Liu, Bing Li, Weiming Hu
Furthermore, these two components are both plug-and-play and can be embedded in any backbone.
4 code implementations • 23 Oct 2020 • Chao Liang, Zhipeng Zhang, Xue Zhou, Bing Li, Shuyuan Zhu, Weiming Hu
However, the inherent differences and relations between detection and re-identification (ReID) are unconsciously overlooked because of treating them as two isolated tasks in the one-shot tracking paradigm.
Ranked #1 on
Multi-Object Tracking
on HiEve
(using extra training data)
1 code implementation • 6 Aug 2020 • Zhipeng Zhang, Bing Li, Weiming Hu, Houwen Peng
We first build a look-up-table (LUT) with the ground-truth mask in the starting frame, and then retrieves the LUT to obtain an attention map for spatial constraints.
4 code implementations • ECCV 2020 • Zhipeng Zhang, Houwen Peng, Jianlong Fu, Bing Li, Weiming Hu
In this paper, we propose a novel object-aware anchor-free network to address this issue.
Ranked #2 on
Visual Object Tracking
on VOT2019
no code implementations • CVPR 2020 • Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zheng-Jun Zha
In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy.
Ranked #9 on
Video Captioning
on VATEX
(using extra training data)
1 code implementation • 11 Dec 2019 • Shaoru Wang, Yongchao Gong, Junliang Xing, Lichao Huang, Chang Huang, Weiming Hu
To reciprocate these two tasks, we design a two-stream structure to learn features on both the object level (i. e., bounding boxes) and the pixel level (i. e., instance masks) jointly.
Ranked #96 on
Instance Segmentation
on COCO test-dev
1 code implementation • ICCV 2019 • Zhao Yang, Qiang Wang, Luca Bertinetto, Weiming Hu, Song Bai, Philip H. S. Torr
Unsupervised video object segmentation has often been tackled by methods based on recurrent neural networks and optical flow.
Ranked #21 on
Unsupervised Video Object Segmentation
on DAVIS 2016 val
no code implementations • 13 Oct 2019 • Ziqi Zhang, Yaya Shi, Jiutong Wei, Chunfeng Yuan, Bing Li, Weiming Hu
Multi-modal information is essential to describe what has happened in a video.
no code implementations • 26 Sep 2019 • Alessandro Fanfarillo, Behrooz Roozitalab, Weiming Hu, Guido Cervone
In order to provide a meaningful probabilistic forecast, the AnEn method requires storing a historical set of past predictions and observations in memory for a period of at least several months and spanning the seasons relevant for the prediction of interest.
no code implementations • 8 May 2019 • Liang Sun, Bing Li, Chunfeng Yuan, Zheng-Jun Zha, Weiming Hu
Inspired by the fact that different modalities in videos carry complementary information, we propose a Multimodal Semantic Attention Network(MSAN), which is a new encoder-decoder framework incorporating multimodal semantic attributes for video captioning.
3 code implementations • CVPR 2019 • Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, Philip H. S. Torr
In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach.
Ranked #3 on
Visual Object Tracking
on YouTube-VOS 2018
no code implementations • ECCV 2018 • Mengdan Zhang, Qiang Wang, Junliang Xing, Jin Gao, Peixi Peng, Weiming Hu, Steve Maybank
Correlation filters based trackers rely on a periodic assumption of the search sample to efficiently distinguish the target from the background.
1 code implementation • ECCV 2018 • Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, Weiming Hu
During the off-line training phase, an effective sampling strategy is introduced to control this distribution and make the model focus on the semantic distractors.
Ranked #11 on
Visual Object Tracking
on VOT2017/18
no code implementations • ECCV 2018 • Yang Du, Chunfeng Yuan, Bing Li, Lili Zhao, Yangxi Li, Weiming Hu
Furthermore, since different layers in a deep network capture feature maps of different scales, we use these feature maps to construct a spatial pyramid and then utilize multi-scale information to obtain more accurate attention scores, which are used to weight the local features in all spatial positions of feature maps to calculate attention maps.
2 code implementations • CVPR 2018 • Qiang Wang, Zhu Teng, Junliang Xing, Jin Gao, Weiming Hu, Stephen Maybank
The RASNet model reformulates the correlation filter within a Siamese tracking framework, and introduces different kinds of the attention mechanisms to adapt the model without updating the model online.
Ranked #3 on
Visual Object Tracking
on OTB-2013
no code implementations • CVPR 2018 • Kai Li, Junliang Xing, Chi Su, Weiming Hu, Yundong Zhang, Stephen Maybank
First, a novel cost-sensitive multi-task loss function is designed to learn transferable aging features by training on the source population.
no code implementations • CVPR 2017 • Yang Du, Chunfeng Yuan, Bing Li, Weiming Hu, Stephen Maybank
In dynamic object detection, it is challenging to construct an effective model to sufficiently characterize the spatial-temporal properties of the background.
5 code implementations • 13 Apr 2017 • Qiang Wang, Jin Gao, Junliang Xing, Mengdan Zhang, Weiming Hu
In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously.
no code implementations • CVPR 2016 • Xinchu Shi, Haibin Ling, Weiming Hu, Junliang Xing, Yanning Zhang
Due to its wide range of applications, matching between two graphs has been extensively studied and remains an active topic.
no code implementations • ICCV 2015 • Lin Ma, Xiaoqin Zhang, Weiming Hu, Junliang Xing, Jiwen Lu, Jie zhou
To address this, this paper presents a local subspace collaborative tracking method for robust visual tracking, where multiple linear and nonlinear subspaces are learned to better model the nonlinear relationship of object appearances.
no code implementations • CVPR 2015 • Shuang Yang, Chunfeng Yuan, Baoxin Wu, Weiming Hu, Fangshi Wang
In this paper, a multi-feature max-margin hierarchical Bayesian model (M3HBM) is proposed for action recognition.
no code implementations • CVPR 2014 • Xinchu Shi, Haibin Ling, Weiming Hu, Chunfeng Yuan, Junliang Xing
In this paper, we model interactions between neighbor targets by pair-wise motion context, and further encode such context into the global association optimization.
no code implementations • CVPR 2014 • Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan
During each training stage, the SRD model learns a relational dictionary to capture consistent relationships between face appearance and shape, which are respectively modeled by the pose-indexed image features and the shape displacements for current estimated landmarks.
no code implementations • CVPR 2014 • Baoxin Wu, Chunfeng Yuan, Weiming Hu
Then, the proposed CGKs are applied to measure the similarity between actions represented by the two-graph model.
no code implementations • 4 Jan 2014 • Xi Li, Weiming Hu, Chunhua Shen, Anthony Dick, Zhongfei Zhang
Using both CAHSM and DHPC, a robust spectral clustering algorithm is developed.
no code implementations • CVPR 2013 • Chunfeng Yuan, Weiming Hu, Guodong Tian, Shuang Yang, Haoran Wang
In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple features from as few bases as possible.
no code implementations • CVPR 2013 • Xinchu Shi, Haibin Ling, Junling Xing, Weiming Hu
In this paper we formulate multi-target tracking (MTT) as a rank-1 tensor approximation problem and propose an 1 norm tensor power iteration solution.
no code implementations • CVPR 2013 • Bing Li, Weihua Xiong, Weiming Hu, Houwen Peng
In this paper, we propose a novel bilayer sparse coding model for illumination estimation that considers image similarity in terms of both low level color distribution and high level image scene content simultaneously.
no code implementations • CVPR 2013 • Chunfeng Yuan, Xi Li, Weiming Hu, Haibin Ling, Stephen Maybank
In this paper, we propose a new global feature to capture the detailed geometrical distribution of interest points.