Search Results for author: Dongxu Li

Found 28 papers, 18 papers with code

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

6 code implementations • 28 Jan 2022 • Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi

Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.

Ranked #3 on Open Vocabulary Attribute Detection on OVAD-Box benchmark (using extra training data)

Image Captioning Image-text matching +5

124,527

Paper
Code

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

12 code implementations • 30 Jan 2023 • Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

Ranked #1 on Image Retrieval on Flickr30k

Generative Visual Question Answering Image Captioning +10

124,527

Paper
Code

LAVIS: A Library for Language-Vision Intelligence

1 code implementation • 15 Sep 2022 • Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

Benchmarking Image Captioning +8

8,674

Paper
Code

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

3 code implementations • 21 Dec 2022 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven C. H. Hoi

To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.

Question Answering Visual Question Answering +1

8,674

Paper
Code

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

1 code implementation • NeurIPS 2023 • Dongxu Li, Junnan Li, Steven C. H. Hoi

Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions.

Representation Learning Text-to-Image Generation

8,674

Paper
Code

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

2 code implementations • NeurIPS 2023 • Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi

Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.

Ranked #5 on visual instruction following on LLaVA-Bench

Video Question Answering visual instruction following +1

8,674

Paper
Code

Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions

1 code implementation • 3 Jan 2024 • David Junhao Zhang, Dongxu Li, Hung Le, Mike Zheng Shou, Caiming Xiong, Doyen Sahoo

This work presents Moonshot, a new video generation model that conditions simultaneously on multimodal inputs of image and text.

Image Animation Video Editing +1

8,674

Paper
Code

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

2 code implementations • 24 Oct 2019 • Dongxu Li, Cristian Rodriguez Opazo, Xin Yu, Hongdong Li

Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios.

Ranked #3 on Sign Language Recognition on WLASL100

Action Classification Benchmarking +3

739

Paper
Code

TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation

2 code implementations • NeurIPS 2020 • Dongxu Li, Chenchen Xu, Xin Yu, Kaihao Zhang, Ben Swift, Hanna Suominen, Hongdong Li

Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.

Sign Language Recognition Sign Language Translation +3

739

Paper
Code

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

1 code implementation • CVPR 2022 • Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, Steven C. H. Hoi

To achieve this, we first introduce an entity prompter module, which is trained with VTC to produce the similarity between a video crop and text prompts instantiated with entity names.

Ranked #19 on Zero-Shot Video Retrieval on DiDeMo

Entity Alignment Retrieval +3

182

Paper
Code

cosFormer: Rethinking Softmax in Attention

3 code implementations • ICLR 2022 • Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

Ranked #4 on Offline RL on D4RL

D4RL Language Modelling +1

173

Paper
Code

EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset

1 code implementation • 11 Oct 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Jingyu Liu, Jiankang Deng, Wei Liu, Stefanos Zafeiriou

It is thus unclear how these algorithms perform on public face hallucination datasets.

Ranked #1 on Image Super-Resolution on WLFW

Benchmarking Face Hallucination +2

Paper
Code

Toeplitz Neural Network for Sequence Modeling

2 code implementations • 8 May 2023 • Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Sequence modeling has important applications in natural language processing and computer vision.

Language Modelling Position

Paper
Code

The Devil in Linear Transformer

1 code implementation • 19 Oct 2022 • Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong

In this paper, we examine existing kernel-based linear transformers and identify two key issues that lead to such performance gaps: 1) unbounded gradients in the attention computation adversely impact the convergence of linear transformer models; 2) attention dilution which trivially distributes attention scores over long sequences while neglecting neighbouring structures.

Language Modelling Text Classification

Paper
Code

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

1 code implementation • 30 Nov 2023 • Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).

Visual Reasoning

Paper
Code

TODE-Trans: Transparent Object Depth Estimation with Transformer

1 code implementation • 18 Sep 2022 • Kang Chen, Shaochen Wang, Beihao Xia, Dongxu Li, Zhen Kan, Bin Li

We observe that the global characteristics of the transformer make it easier to extract contextual information to perform depth estimation of transparent areas.

Depth Estimation Object +2

Paper
Code

Enhanced Spatio-Temporal Interaction Learning for Video Deraining: A Faster and Better Framework

1 code implementation • 23 Mar 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Wei Liu

Video deraining is an important task in computer vision as the unwanted rain hampers the visibility of videos and deteriorates the robustness of most outdoor vision systems.

Rain Removal

Paper
Code

Transcribing Natural Languages for The Deaf via Neural Editing Programs

1 code implementation • 17 Dec 2021 • Dongxu Li, Chenchen Xu, Liu Liu, Yiran Zhong, Rong Wang, Lars Petersson, Hongdong Li

This work studies the task of glossification, of which the aim is to em transcribe natural spoken language sentences for the Deaf (hard-of-hearing) community to ordered sign language glosses.

Sentence

Paper
Code

Transferring Cross-domain Knowledge for Video Sign Language Recognition

no code implementations • CVPR 2020 • Dongxu Li, Xin Yu, Chenchen Xu, Lars Petersson, Hongdong Li

To this end, we extract news signs using a base WSLR model, and then design a classifier jointly trained on news and isolated signs to coarsely align these two domain features.

Sign Language Recognition

Paper
Add Code

ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring

no code implementations • CVPR 2021 • Dongxu Li, Chenchen Xu, Kaihao Zhang, Xin Yu, Yiran Zhong, Wenqi Ren, Hanna Suominen, Hongdong Li

Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.

Deblurring

Paper
Add Code

Dual Attention-in-Attention Model for Joint Rain Streak and Raindrop Removal

no code implementations • 12 Mar 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren

In addition, to further refine the result, a Differential-driven Dual Attention-in-Attention Model (D-DAiAM) is proposed with a "heavy-to-light" scheme to remove rain via addressing the unsatisfying deraining regions.

Rain Removal

Paper
Add Code

Benchmarking Ultra-High-Definition Image Super-Resolution

no code implementations • ICCV 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang

Increasingly, modern mobile devices allow capturing images at Ultra-High-Definition (UHD) resolution, which includes 4K and 8K images.

4k 8k +3

Paper
Add Code

Automatic Gloss Dictionary for Sign Language Learners

no code implementations • ACL 2022 • Chenchen Xu, Dongxu Li, Hongdong Li, Hanna Suominen, Ben Swift

A multi-language dictionary is a fundamental tool for language learning, allowing the learner to look up unfamiliar words.

Paper
Add Code

PCRED: Zero-shot Relation Triplet Extraction with Potential Candidate Relation Selection and Entity Boundary Detection

no code implementations • 26 Nov 2022 • Yuquan Lan, Dongxu Li, Yunqi Zhang, Hui Zhao, Gang Zhao

To address the above issues, we propose a novel method named PCRED for ZeroRTE with Potential Candidate Relation Selection and Entity Boundary Detection.

Boundary Detection Relation +1

Paper
Add Code

Joint Task and Data Oriented Semantic Communications: A Deep Separate Source-channel Coding Scheme

no code implementations • 27 Feb 2023 • Jianhao Huang, Dongxu Li, Chuan Huang, Xiaoqi Qin, Wei zhang

This paper proposes a deep separate source-channel coding (DSSCC) framework for the joint task and data oriented semantic communications (JTD-SC) and utilizes the variational autoencoder approach to solve the rate-distortion problem with semantic distortion.

Bayesian Inference Data Compression

Paper
Add Code

From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models

no code implementations • CVPR 2023 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven Hoi

To address this issue, we propose Img2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.

Question Answering Visual Question Answering +1

Paper
Add Code

Linearized Relative Positional Encoding

no code implementations • 18 Jul 2023 • Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

Image Classification Language Modelling +2

Paper
Add Code

Fundamental Limitation of Semantic Communications: Neural Estimation for Rate-Distortion

no code implementations • 2 Jan 2024 • Dongxu Li, Jianhao Huang, Chuan Huang, Xiaoqi Qin, Han Zhang, Ping Zhang

For the case with unknown semantic source distribution, while only a set of the source samples is available, we propose a neural-network-based method by leveraging the generative networks to learn the semantic source distribution.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.