Search Results for author: Longtian Qiu

Found 9 papers, 7 papers with code

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

2 code implementations9 May 2024 Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li

Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

1 code implementation4 Jan 2024 Longtian Qiu, Shan Ning, Xuming He

Firstly, we observe that the CLIP's visual feature of image subregions can achieve closer proximity to the paired caption due to the inherent information loss in text descriptions.

Descriptive Image Captioning +1

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

1 code implementation CVPR 2023 Shan Ning, Longtian Qiu, Yongfei Liu, Xuming He

In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human-object pair detection.

Decoder Human-Object Interaction Detection +3

Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training

no code implementations27 Feb 2023 Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzhi Li, Pheng-Ann Heng

In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.

Decoder Point Cloud Pre-training +1

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

1 code implementation28 Sep 2022 Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzheng Ma, Xupeng Miao, Xuming He, Bin Cui

Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification.

Training-free 3D Point Cloud Classification Transfer Learning +1

VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts

no code implementations4 Dec 2021 Longtian Qiu, Renrui Zhang, Ziyu Guo, Ziyao Zeng, Zilu Guo, Yafeng Li, Guangnan Zhang

Contrastive Language-Image Pre-training (CLIP) has drawn increasing attention recently for its transferable visual representation learning.

Language Modelling Representation Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.