Search Results for author: XiaoHu Qie

Found 25 papers, 16 papers with code

OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution

no code implementations • ICCV 2023 • Zidong Cao, Hao Ai, Yan-Pei Cao, Ying Shan, XiaoHu Qie, Lin Wang

The M\"obius transformation is typically employed to further provide the opportunity for movement and zoom on ODIs, but applying it to the image level often results in blurry effect and aliasing problem.

Paper
Add Code

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video

no code implementations • ICCV 2023 • Jia-Wei Liu, Yan-Pei Cao, Tianyuan Yang, Eric Zhongcong Xu, Jussi Keppo, Ying Shan, XiaoHu Qie, Mike Zheng Shou

Our method enables pausing the video at any frame and rendering all scene details (dynamic humans, objects, and backgrounds) from arbitrary viewpoints.

Human-Object Interaction Detection Object

Paper
Add Code

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

3 code implementations • ICCV 2023 • Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, XiaoHu Qie, Yinqiang Zheng

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results.

Ranked #11 on Text-based Image Editing on PIE-Bench

Text-based Image Editing

629

Paper
Code

VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis

no code implementations • 28 Mar 2023 • Yuan-Chen Guo, Yan-Pei Cao, Chen Wang, Yu He, Ying Shan, XiaoHu Qie, Song-Hai Zhang

With the emergence of neural radiance fields (NeRFs), view synthesis quality has reached an unprecedented level.

Paper
Add Code

Accelerating Vision-Language Pretraining with Free Language Modeling

1 code implementation • CVPR 2023 • Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, XiaoHu Qie, Ping Luo

FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted.

Language Modelling Masked Language Modeling

Paper
Code

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

2 code implementations • 16 Feb 2023 • Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, XiaoHu Qie

In this paper, we aim to ``dig out" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly.

Image Generation Style Transfer

3,147

Paper
Code

RILS: Masked Visual Reconstruction in Language Semantic Space

1 code implementation • CVPR 2023 • Shusheng Yang, Yixiao Ge, Kun Yi, Dian Li, Ying Shan, XiaoHu Qie, Xinggang Wang

Both masked image modeling (MIM) and natural language supervision have facilitated the progress of transferable visual pre-training.

Sentence

Paper
Code

ViLEM: Visual-Language Error Modeling for Image-Text Retrieval

no code implementations • CVPR 2023 • Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Weiming Hu, XiaoHu Qie, Jianping Wu

ViLEM then enforces the model to discriminate the correctness of each word in the plausible negative texts and further correct the wrong words via resorting to image information.

Ranked #45 on Visual Reasoning on Winoground

Contrastive Learning Retrieval +3

Paper
Add Code

Order-Prompted Tag Sequence Generation for Video Tagging

no code implementations • ICCV 2023 • Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Yingmin Luo, Zekun Li, Chunfeng Yuan, Bing Li, XiaoHu Qie, Ying Shan, Weiming Hu

This paper proposes a novel generative model, Order-Prompted Tag Sequence Generation (OP-TSG), according to the above characteristics.

Multi-Label Classification TAG

Paper
Add Code

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

no code implementations • CVPR 2023 • Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, XiaoHu Qie, Shenghua Gao

Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior.

Image Generation Text to 3D +1

Paper
Add Code

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

3 code implementations • ICCV 2023 • Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, YuChao Gu, Yufei Shi, Wynne Hsu, Ying Shan, XiaoHu Qie, Mike Zheng Shou

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.

Style Transfer Text-to-Video Generation +1

4,079

Paper
Code

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

no code implementations • 6 Dec 2022 • YuChao Gu, Xintao Wang, Yixiao Ge, Ying Shan, XiaoHu Qie, Mike Zheng Shou

Vector-Quantized (VQ-based) generative models usually consist of two basic components, i. e., VQ tokenizers and generative transformers.

Conditional Image Generation

Paper
Add Code

One for All, All for One: Learning and Transferring User Embeddings for Cross-Domain Recommendation

1 code implementation • 22 Nov 2022 • Chenglin Li, Yuanzhen Xie, Chenyun Yu, Bo Hu, Zang Li, Guoqiang Shu, XiaoHu Qie, Di Niu

CAT-ART boosts the recommendation performance in any target domain through the combined use of the learned global user representation and knowledge transferred from other domains, in addition to the original user embedding in the target domain.

Multi-Domain Recommender Systems Recommendation Systems +1

Paper
Code

Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

2 code implementations • 13 Oct 2022 • Guanghu Yuan, Fajie Yuan, Yudong Li, Beibei Kong, Shujie Li, Lei Chen, Min Yang, Chenyun Yu, Bo Hu, Zang Li, Yu Xu, XiaoHu Qie

Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback.

Recommendation Systems

174

Paper
Code

DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes

1 code implementation • 31 May 2022 • Jia-Wei Liu, Yan-Pei Cao, Weijia Mao, Wenqiao Zhang, David Junhao Zhang, Jussi Keppo, Ying Shan, XiaoHu Qie, Mike Zheng Shou

In this paper, we present DeVRF, a novel representation to accelerate learning dynamic radiance fields.

Novel View Synthesis

177

Paper
Code

Masked Image Modeling with Denoising Contrast

1 code implementation • 19 May 2022 • Kun Yi, Yixiao Ge, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, XiaoHu Qie

Since the development of self-supervised visual representation learning from contrastive learning to masked image modeling (MIM), there is no significant difference in essence, that is, how to design proper pretext tasks for vision dictionary look-up.

Contrastive Learning Denoising +6

Paper
Code

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

1 code implementation • 26 Apr 2022 • Yuying Ge, Yixiao Ge, Xihui Liu, Alex Jinpeng Wang, Jianping Wu, Ying Shan, XiaoHu Qie, Ping Luo

Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.

Ranked #7 on Zero-Shot Video Retrieval on MSVD

Action Recognition Retrieval +6

130

Paper
Code

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection

1 code implementation • CVPR 2022 • Ye Liu, Siyuan Li, Yang Wu, Chang Wen Chen, Ying Shan, XiaoHu Qie

Finding relevant moments and highlights in videos according to natural language queries is a natural and highly valuable common need in the current video content explosion era.

Ranked #3 on Highlight Detection on YouTube Highlights

Highlight Detection Moment Retrieval +3

177

Paper
Code

Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval

2 code implementations • 15 Mar 2022 • Guanyu Cai, Yixiao Ge, Binjie Zhang, Alex Jinpeng Wang, Rui Yan, Xudong Lin, Ying Shan, Lianghua He, XiaoHu Qie, Jianping Wu, Mike Zheng Shou

Recent dominant methods for video-language pre-training (VLP) learn transferable representations from the raw pixels in an end-to-end manner to achieve advanced performance on downstream video-language retrieval.

Question Answering Retrieval +4

Paper
Code

All in One: Exploring Unified Video-Language Pre-training

1 code implementation • CVPR 2023 • Alex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, XiaoHu Qie, Mike Zheng Shou

In this work, we for the first time introduce an end-to-end video-language model, namely \textit{all-in-one Transformer}, that embeds raw video and textual signals into joint representations using a unified backbone architecture.

Ranked #6 on TGIF-Transition on TGIF-QA (using extra training data)

Language Modelling Multiple-choice +10

272

Paper
Code

Bridging Video-text Retrieval with Multiple Choice Questions

2 code implementations • CVPR 2022 • Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, XiaoHu Qie, Ping Luo

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

Ranked #8 on Zero-Shot Video Retrieval on MSVD

Action Recognition Multiple-choice +8

2,986

Paper
Code

BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild

no code implementations • CVPR 2022 • Xixi Xu, Zhongang Qi, jianqi ma, Honglun Zhang, Ying Shan, XiaoHu Qie

Current researches mainly focus on only English characters and digits, while few work studies Chinese characters due to the lack of public large-scale and high-quality Chinese datasets, which limits the practical application scenarios of text segmentation.

Segmentation Style Transfer +2

Paper
Add Code

Object-aware Video-language Pre-training for Retrieval

1 code implementation • CVPR 2022 • Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, XiaoHu Qie, Mike Zheng Shou

In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations.

Ranked #20 on Zero-Shot Video Retrieval on DiDeMo

Object Retrieval +2

Paper
Code

Graph-Based Equilibrium Metrics for Dynamic Supply-Demand Systems with Applications to Ride-sourcing Platforms

1 code implementation • 11 Feb 2021 • Fan Zhou, Shikai Luo, XiaoHu Qie, Jieping Ye, Hongtu Zhu

How to dynamically measure the local-to-global spatio-temporal coherence between demand and supply networks is a fundamental task for ride-sourcing platforms, such as DiDi.

Optimization and Control Applications

Paper
Code

Spatio-Temporal Hierarchical Adaptive Dispatching for Ridesharing Systems

no code implementations • 4 Sep 2020 • Chang Liu, Jiahui Sun, Haiming Jin, Meng Ai, Qun Li, Cheng Zhang, Kehua Sheng, Guobin Wu, XiaoHu Qie, Xinbing Wang

Thus, in this paper, we exploit adaptive dispatching intervals to boost the platform's profit under a guarantee of the maximum passenger waiting time.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.