Search Results for author: Yupan Huang

Found 12 papers, 6 papers with code

TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering

no code implementations28 Nov 2023 Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text.

Language Modelling Large Language Model +1

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

1 code implementation31 Aug 2023 Yupan Huang, Zaiqiao Meng, Fangyu Liu, Yixuan Su, Nigel Collier, Yutong Lu

Our experiments validate the effectiveness of SparklesChat in understanding and reasoning across multiple images and dialogue turns.

Instruction Following Visual Reasoning

TextDiffuser: Diffusion Models as Text Painters

no code implementations NeurIPS 2023 Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text.

Optical Character Recognition (OCR)

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

2 code implementations18 Apr 2022 Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei

In this paper, we propose \textbf{LayoutLMv3} to pre-train multimodal Transformers for Document AI with unified text and image masking.

Document AI Document Image Classification +10

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

1 code implementation19 Oct 2021 Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu

In this work, we demonstrate such an AI creation system to produce both diverse captions and rich images.

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

1 code implementation19 Oct 2021 Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu

We adopt Transformer as our unified architecture for its strong performance and task-agnostic design.

Text Generation Text-to-Image Generation

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

3 code implementations CVPR 2021 Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu

As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages.

Representation Learning Retrieval +3

Reinforcing Short-Length Hashing

no code implementations24 Apr 2020 Xingbo Liu, Xiushan Nie, Qi Dai, Yupan Huang, Yilong Yin

Due to the compelling efficiency in retrieval and storage, similarity-preserving hashing has been widely applied to approximate nearest neighbor search in large-scale image retrieval.

Image Retrieval Retrieval

Decoupling Localization and Classification in Single Shot Temporal Action Detection

1 code implementation16 Apr 2019 Yupan Huang, Qi Dai, Yutong Lu

Each branch produces a set of action anchor layers by applying deconvolution to the feature maps of the main stream.

Action Detection Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.