no code implementations • ACL 2022 • Chenchen Xu, Dongxu Li, Hongdong Li, Hanna Suominen, Ben Swift
A multi-language dictionary is a fundamental tool for language learning, allowing the learner to look up unfamiliar words.
1 code implementation • 24 May 2023 • Dongxu Li, Junnan Li, Steven C. H. Hoi
Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions.
1 code implementation • 11 May 2023 • Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi
In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pre-trained BLIP-2 models.
1 code implementation • 8 May 2023 • Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong
Sequence modeling has important applications in natural language processing and computer vision.
no code implementations • 27 Feb 2023 • Jianhao Huang, Dongxu Li, Chuan Huang, Xiaoqi Qin, Wei zhang
This paper proposes a deep separate source-channel coding (DSSCC) framework for the joint task and data oriented semantic communications (JTD-SC) and utilizes the variational autoencoder approach to solve the rate-distortion problem with semantic distortion.
5 code implementations • 30 Jan 2023 • Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
Ranked #1 on
Image Retrieval
on COCO
no code implementations • CVPR 2023 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven Hoi
To address this issue, we propose Img2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.
1 code implementation • 21 Dec 2022 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven C. H. Hoi
To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.
no code implementations • 26 Nov 2022 • Yuquan Lan, Dongxu Li, Yunqi Zhang, Hui Zhao, Gang Zhao
To address the above issues, we propose a novel method named PCRED for ZeroRTE with Potential Candidate Relation Selection and Entity Boundary Detection.
1 code implementation • 19 Oct 2022 • Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong
In this paper, we examine existing kernel-based linear transformers and identify two key issues that lead to such performance gaps: 1) unbounded gradients in the attention computation adversely impact the convergence of linear transformer models; 2) attention dilution which trivially distributes attention scores over long sequences while neglecting neighbouring structures.
1 code implementation • 18 Sep 2022 • Kang Chen, Shaochen Wang, Beihao Xia, Dongxu Li, Zhen Kan, Bin Li
We observe that the global characteristics of the transformer make it easier to extract contextual information to perform depth estimation of transparent areas.
1 code implementation • 15 Sep 2022 • Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi
We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.
2 code implementations • ICLR 2022 • Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong
As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.
5 code implementations • 28 Jan 2022 • Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi
Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.
Ranked #4 on
Image Captioning
on nocaps-val-out-domain
no code implementations • 17 Dec 2021 • Dongxu Li, Chenchen Xu, Liu Liu, Yiran Zhong, Rong Wang, Lars Petersson, Hongdong Li
This work studies the task of glossification, of which the aim is to em transcribe natural spoken language sentences for the Deaf (hard-of-hearing) community to ordered sign language glosses.
1 code implementation • CVPR 2022 • Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, Steven C. H. Hoi
To achieve this, we first introduce an entity prompter module, which is trained with VTC to produce the similarity between a video crop and text prompts instantiated with entity names.
Ranked #11 on
Zero-Shot Video Retrieval
on DiDeMo
1 code implementation • 11 Oct 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Jingyu Liu, Jiankang Deng, Wei Liu, Stefanos Zafeiriou
It is thus unclear how these algorithms perform on public face hallucination datasets.
Ranked #1 on
Image Super-Resolution
on WLFW
1 code implementation • 23 Mar 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Wei Liu
Video deraining is an important task in computer vision as the unwanted rain hampers the visibility of videos and deteriorates the robustness of most outdoor vision systems.
no code implementations • 12 Mar 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren
In addition, to further refine the result, a Differential-driven Dual Attention-in-Attention Model (D-DAiAM) is proposed with a "heavy-to-light" scheme to remove rain via addressing the unsatisfying deraining regions.
no code implementations • CVPR 2021 • Dongxu Li, Chenchen Xu, Kaihao Zhang, Xin Yu, Yiran Zhong, Wenqi Ren, Hanna Suominen, Hongdong Li
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
no code implementations • ICCV 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang
Increasingly, modern mobile devices allow capturing images at Ultra-High-Definition (UHD) resolution, which includes 4K and 8K images.
2 code implementations • NeurIPS 2020 • Dongxu Li, Chenchen Xu, Xin Yu, Kaihao Zhang, Ben Swift, Hanna Suominen, Hongdong Li
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.
no code implementations • CVPR 2020 • Dongxu Li, Xin Yu, Chenchen Xu, Lars Petersson, Hongdong Li
To this end, we extract news signs using a base WSLR model, and then design a classifier jointly trained on news and isolated signs to coarsely align these two domain features.
2 code implementations • 24 Oct 2019 • Dongxu Li, Cristian Rodriguez Opazo, Xin Yu, Hongdong Li
Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios.
Ranked #3 on
Sign Language Recognition
on WLASL100