Search Results for author: Hongwei Xue

Found 8 papers, 4 papers with code

Stare at What You See: Masked Image Modeling without Reconstruction

no code implementations • CVPR 2023 • Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo

However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?

Paper
Add Code

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

1 code implementation • 12 Oct 2022 • Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu

Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks.

Ranked #2 on Video Retrieval on QuerYD (using extra training data)

Contrastive Learning Question Answering +3

437

Paper
Code

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

1 code implementation • 14 Sep 2022 • Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo

and 2) how to mitigate the impact of these factors?

Ranked #2 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Retrieval Text Retrieval +1

437

Paper
Code

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

1 code implementation • CVPR 2022 • Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.

Ranked #16 on Video Retrieval on MSR-VTT

Retrieval Super-Resolution +4

437

Paper
Code

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

1 code implementation • 19 Oct 2021 • Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu

We adopt Transformer as our unified architecture for its strong performance and task-agnostic design.

Text Generation Text-to-Image Generation

Paper
Code

Learning Fine-Grained Motion Embedding for Landscape Animation

no code implementations • 6 Sep 2021 • Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.

Paper
Add Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +5

Paper
Add Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.