Search Results for author: XiaoHu Qie

Found 25 papers, 16 papers with code

OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution

no code implementations ICCV 2023 Zidong Cao, Hao Ai, Yan-Pei Cao, Ying Shan, XiaoHu Qie, Lin Wang

The M\"obius transformation is typically employed to further provide the opportunity for movement and zoom on ODIs, but applying it to the image level often results in blurry effect and aliasing problem.

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

3 code implementations ICCV 2023 Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, XiaoHu Qie, Yinqiang Zheng

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results.

Text-based Image Editing

VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis

no code implementations28 Mar 2023 Yuan-Chen Guo, Yan-Pei Cao, Chen Wang, Yu He, Ying Shan, XiaoHu Qie, Song-Hai Zhang

With the emergence of neural radiance fields (NeRFs), view synthesis quality has reached an unprecedented level.

2k

Accelerating Vision-Language Pretraining with Free Language Modeling

1 code implementation CVPR 2023 Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, XiaoHu Qie, Ping Luo

FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted.

Language Modelling Masked Language Modeling

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

2 code implementations16 Feb 2023 Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, XiaoHu Qie

In this paper, we aim to ``dig out" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly.

Image Generation Style Transfer

RILS: Masked Visual Reconstruction in Language Semantic Space

1 code implementation CVPR 2023 Shusheng Yang, Yixiao Ge, Kun Yi, Dian Li, Ying Shan, XiaoHu Qie, Xinggang Wang

Both masked image modeling (MIM) and natural language supervision have facilitated the progress of transferable visual pre-training.

Sentence

ViLEM: Visual-Language Error Modeling for Image-Text Retrieval

no code implementations CVPR 2023 Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Weiming Hu, XiaoHu Qie, Jianping Wu

ViLEM then enforces the model to discriminate the correctness of each word in the plausible negative texts and further correct the wrong words via resorting to image information.

Contrastive Learning Retrieval +3

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

no code implementations6 Dec 2022 YuChao Gu, Xintao Wang, Yixiao Ge, Ying Shan, XiaoHu Qie, Mike Zheng Shou

Vector-Quantized (VQ-based) generative models usually consist of two basic components, i. e., VQ tokenizers and generative transformers.

Conditional Image Generation

One for All, All for One: Learning and Transferring User Embeddings for Cross-Domain Recommendation

1 code implementation22 Nov 2022 Chenglin Li, Yuanzhen Xie, Chenyun Yu, Bo Hu, Zang Li, Guoqiang Shu, XiaoHu Qie, Di Niu

CAT-ART boosts the recommendation performance in any target domain through the combined use of the learned global user representation and knowledge transferred from other domains, in addition to the original user embedding in the target domain.

Multi-Domain Recommender Systems Recommendation Systems +1

Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

2 code implementations13 Oct 2022 Guanghu Yuan, Fajie Yuan, Yudong Li, Beibei Kong, Shujie Li, Lei Chen, Min Yang, Chenyun Yu, Bo Hu, Zang Li, Yu Xu, XiaoHu Qie

Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback.

Recommendation Systems

Masked Image Modeling with Denoising Contrast

1 code implementation19 May 2022 Kun Yi, Yixiao Ge, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, XiaoHu Qie

Since the development of self-supervised visual representation learning from contrastive learning to masked image modeling (MIM), there is no significant difference in essence, that is, how to design proper pretext tasks for vision dictionary look-up.

Contrastive Learning Denoising +6

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

1 code implementation26 Apr 2022 Yuying Ge, Yixiao Ge, Xihui Liu, Alex Jinpeng Wang, Jianping Wu, Ying Shan, XiaoHu Qie, Ping Luo

Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.

Action Recognition Retrieval +6

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection

1 code implementation CVPR 2022 Ye Liu, Siyuan Li, Yang Wu, Chang Wen Chen, Ying Shan, XiaoHu Qie

Finding relevant moments and highlights in videos according to natural language queries is a natural and highly valuable common need in the current video content explosion era.

Highlight Detection Moment Retrieval +3

Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval

2 code implementations15 Mar 2022 Guanyu Cai, Yixiao Ge, Binjie Zhang, Alex Jinpeng Wang, Rui Yan, Xudong Lin, Ying Shan, Lianghua He, XiaoHu Qie, Jianping Wu, Mike Zheng Shou

Recent dominant methods for video-language pre-training (VLP) learn transferable representations from the raw pixels in an end-to-end manner to achieve advanced performance on downstream video-language retrieval.

Question Answering Retrieval +4

All in One: Exploring Unified Video-Language Pre-training

1 code implementation CVPR 2023 Alex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, XiaoHu Qie, Mike Zheng Shou

In this work, we for the first time introduce an end-to-end video-language model, namely \textit{all-in-one Transformer}, that embeds raw video and textual signals into joint representations using a unified backbone architecture.

Ranked #6 on TGIF-Transition on TGIF-QA (using extra training data)

Language Modelling Multiple-choice +10

Bridging Video-text Retrieval with Multiple Choice Questions

2 code implementations CVPR 2022 Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, XiaoHu Qie, Ping Luo

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

Action Recognition Multiple-choice +8

BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild

no code implementations CVPR 2022 Xixi Xu, Zhongang Qi, jianqi ma, Honglun Zhang, Ying Shan, XiaoHu Qie

Current researches mainly focus on only English characters and digits, while few work studies Chinese characters due to the lack of public large-scale and high-quality Chinese datasets, which limits the practical application scenarios of text segmentation.

Segmentation Style Transfer +2

Object-aware Video-language Pre-training for Retrieval

1 code implementation CVPR 2022 Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, XiaoHu Qie, Mike Zheng Shou

In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations.

Object Retrieval +2

Graph-Based Equilibrium Metrics for Dynamic Supply-Demand Systems with Applications to Ride-sourcing Platforms

1 code implementation11 Feb 2021 Fan Zhou, Shikai Luo, XiaoHu Qie, Jieping Ye, Hongtu Zhu

How to dynamically measure the local-to-global spatio-temporal coherence between demand and supply networks is a fundamental task for ride-sourcing platforms, such as DiDi.

Optimization and Control Applications

Spatio-Temporal Hierarchical Adaptive Dispatching for Ridesharing Systems

no code implementations4 Sep 2020 Chang Liu, Jiahui Sun, Haiming Jin, Meng Ai, Qun Li, Cheng Zhang, Kehua Sheng, Guobin Wu, XiaoHu Qie, Xinbing Wang

Thus, in this paper, we exploit adaptive dispatching intervals to boost the platform's profit under a guarantee of the maximum passenger waiting time.

Cannot find the paper you are looking for? You can Submit a new open access paper.