Search Results for author: XiaoHu Qie

Found 12 papers, 8 papers with code

Weakly-supervised Action Localization via Hierarchical Mining

no code implementations22 Jun 2022 Jia-Chang Feng, Fa-Ting Hong, Jia-Run Du, Zhongang Qi, Ying Shan, XiaoHu Qie, Wei-Shi Zheng, Jianping Wu

In this work, we propose a hierarchical mining strategy under video-level and snippet-level manners, i. e., hierarchical supervision and hierarchical consistency mining, to maximize the usage of the given annotations and prediction-wise consistency.

Action Localization Multiple Instance Learning +2

Masked Image Modeling with Denoising Contrast

no code implementations19 May 2022 Kun Yi, Yixiao Ge, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, XiaoHu Qie

Since the development of self-supervised visual representation learning from contrastive learning to masked image modeling, there is no significant difference in essence, that is, how to design proper pretext tasks for vision dictionary look-up.

Contrastive Learning Denoising +6

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

1 code implementation26 Apr 2022 Yuying Ge, Yixiao Ge, Xihui Liu, Alex Jinpeng Wang, Jianping Wu, Ying Shan, XiaoHu Qie, Ping Luo

Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.

Action Recognition Text to Video Retrieval +2

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection

1 code implementation CVPR 2022 Ye Liu, Siyuan Li, Yang Wu, Chang Wen Chen, Ying Shan, XiaoHu Qie

Finding relevant moments and highlights in videos according to natural language queries is a natural and highly valuable common need in the current video content explosion era.

Highlight Detection Moment Retrieval +1

Revitalize Region Feature for Democratizing Video-Language Pre-training

2 code implementations15 Mar 2022 Guanyu Cai, Yixiao Ge, Alex Jinpeng Wang, Rui Yan, Xudong Lin, Ying Shan, Lianghua He, XiaoHu Qie, Jianping Wu, Mike Zheng Shou

Recent dominant methods for video-language pre-training (VLP) learn transferable representations from the raw pixels in an end-to-end manner to achieve advanced performance on downstream video-language tasks.

Question Answering Text to Video Retrieval +3

All in One: Exploring Unified Video-Language Pre-training

1 code implementation14 Mar 2022 Alex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, XiaoHu Qie, Mike Zheng Shou

In this work, we for the first time introduce an end-to-end video-language model, namely \textit{all-in-one Transformer}, that embeds raw video and textual signals into joint representations using a unified backbone architecture.

Language Modelling Multiple-choice +10

Bridging Video-text Retrieval with Multiple Choice Questions

2 code implementations CVPR 2022 Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, XiaoHu Qie, Ping Luo

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

Ranked #19 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Action Recognition Multiple-choice +4

BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild

no code implementations CVPR 2022 Xixi Xu, Zhongang Qi, jianqi ma, Honglun Zhang, Ying Shan, XiaoHu Qie

Current researches mainly focus on only English characters and digits, while few work studies Chinese characters due to the lack of public large-scale and high-quality Chinese datasets, which limits the practical application scenarios of text segmentation.

Style Transfer Text Segmentation +1

Object-aware Video-language Pre-training for Retrieval

1 code implementation CVPR 2022 Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, XiaoHu Qie, Mike Zheng Shou

In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations.

Text Matching

Graph-Based Equilibrium Metrics for Dynamic Supply-Demand Systems with Applications to Ride-sourcing Platforms

1 code implementation11 Feb 2021 Fan Zhou, Shikai Luo, XiaoHu Qie, Jieping Ye, Hongtu Zhu

How to dynamically measure the local-to-global spatio-temporal coherence between demand and supply networks is a fundamental task for ride-sourcing platforms, such as DiDi.

Optimization and Control Applications

Spatio-Temporal Hierarchical Adaptive Dispatching for Ridesharing Systems

no code implementations4 Sep 2020 Chang Liu, Jiahui Sun, Haiming Jin, Meng Ai, Qun Li, Cheng Zhang, Kehua Sheng, Guobin Wu, XiaoHu Qie, Xinbing Wang

Thus, in this paper, we exploit adaptive dispatching intervals to boost the platform's profit under a guarantee of the maximum passenger waiting time.

Cannot find the paper you are looking for? You can Submit a new open access paper.