no code implementations • 27 Aug 2023 • Xiujun Shu, Wei Wen, Liangsheng Xu, Mingbao Lin, Ruizhi Qiao, Taian Guo, Hanjun Li, Bei Gan, Xiao Wang, Xing Sun
In this paper, we present a unified and dynamic graph (UniDG) framework for temporal character grouping.
1 code implementation • ICCV 2023 • Hanjun Li, Xiujun Shu, Sunan He, Ruizhi Qiao, Wei Wen, Taian Guo, Bei Gan, Xing Sun
Under this setup, we propose a Dynamic Gaussian prior based Grounding framework with Glance annotation (D3G), which consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and a Dynamic Gaussian prior Adjustment module (DGA).
Ranked #10 on Temporal Sentence Grounding on Charades-STA
1 code implementation • ICCV 2023 • Yunquan Zhu, Xinkai Gao, Bo Ke, Ruizhi Qiao, Xing Sun
Image retrieval targets to find images from a database that are visually similar to the query image.
1 code implementation • CVPR 2023 • Bei Gan, Xiujun Shu, Ruizhi Qiao, Haoqian Wu, Keyu Chen, Hanjun Li, Bo Ren
Based on existing efforts, this work has two observations: (1) For different annotators, labeling highlight has uncertainty, which leads to inaccurate and time-consuming annotations.
no code implementations • CVPR 2023 • Haoqian Wu, Keyu Chen, Haozhe Liu, Mingchen Zhuge, Bing Li, Ruizhi Qiao, Xiujun Shu, Bei Gan, Liangsheng Xu, Bo Ren, Mengmeng Xu, Wentian Zhang, Raghavendra Ramachandra, Chia-Wen Lin, Bernard Ghanem
Temporal video segmentation is the get-to-go automatic video analysis, which decomposes a long-form video into smaller components for the following-up understanding tasks.
no code implementations • 19 Aug 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Chen Wu, Xiujun Shu, Bo Ren
Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data.
1 code implementation • 18 Aug 2022 • Xiujun Shu, Wei Wen, Haoqian Wu, Keyu Chen, Yiran Song, Ruizhi Qiao, Bo Ren, Xiao Wang
To explore the fine-grained alignment, we further propose two implicit semantic alignment paradigms: multi-level alignment (MLA) and bidirectional mask modeling (BMM).
no code implementations • 12 Aug 2022 • Xiujun Shu, Wei Wen, Taian Guo, Sunan He, Chen Wu, Ruizhi Qiao
This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022.
1 code implementation • 5 Jul 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Bo Ren, Shu-Tao Xia
Specifically, our method exploits multi-modal knowledge of image-text pairs based on a vision and language pre-training (VLP) model.
Ranked #1 on Multi-label zero-shot learning on Open Images V4
1 code implementation • CVPR 2022 • Haoqian Wu, Keyu Chen, Yanan Luo, Ruizhi Qiao, Bo Ren, Haozhe Liu, Weicheng Xie, Linlin Shen
Additionally, we suggest a more fair and reasonable benchmark to evaluate the performance of Video Scene Segmentation methods.
1 code implementation • CVPR 2022 • Mengtian Li, Yuan Xie, Yunhang Shen, Bo Ke, Ruizhi Qiao, Bo Ren, Shaohui Lin, Lizhuang Ma
To address the huge labeling cost in large-scale point cloud semantic segmentation, we propose a novel hybrid contrastive regularization (HybridCR) framework in weakly-supervised setting, which obtains competitive performance compared to its fully-supervised counterpart.
no code implementations • 27 Nov 2021 • Xiujun Shu, Yusheng Tao, Ruizhi Qiao, Bo Ke, Wei Wen, Bo Ren
It is by far the largest dataset for person search in media.
no code implementations • 18 Jun 2021 • Chengwei Chen, Yuan Xie, Shaohui Lin, Ruizhi Qiao, Jian Zhou, Xin Tan, Yi Zhang, Lizhuang Ma
Moreover, our model is more stable for training in a non-adversarial manner, compared to other adversarial based novelty detection methods.
7 code implementations • CVPR 2021 • Haiyan Wu, Yanyun Qu, Shaohui Lin, Jian Zhou, Ruizhi Qiao, Zhizhong Zhang, Yuan Xie, Lizhuang Ma
In this paper, we propose a novel contrastive regularization (CR) built upon contrastive learning to exploit both the information of hazy images and clear images as negative and positive samples, respectively.
Ranked #5 on Image Dehazing on RS-Haze
no code implementations • 18 Jul 2017 • Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
To overcome this visual-semantic discrepancy, this work proposes an objective function to re-align the distributed word embeddings with visual information by learning a neural network to map it into a new representation called visually aligned word embedding (VAWE).
no code implementations • 26 Mar 2017 • Fayao Liu, Guosheng Lin, Ruizhi Qiao, Chunhua Shen
In this fashion, we easily achieve nonlinear learning of potential functions on both unary and pairwise terms in CRFs.
no code implementations • CVPR 2016 • Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
Classifying a visual concept merely from its associated online textual source, such as a Wikipedia article, is an attractive research topic in zero-shot learning because it alleviates the burden of manually collecting semantic attributes.
no code implementations • 20 Apr 2015 • Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, Anton von den Hengel
The introduction of low-cost RGB-D sensors has promoted the research in skeleton-based human action recognition.