no code implementations • ACL 2022 • Chenchen Xu, Dongxu Li, Hongdong Li, Hanna Suominen, Ben Swift
A multi-language dictionary is a fundamental tool for language learning, allowing the learner to look up unfamiliar words.
1 code implementation • 8 Oct 2024 • Dongxu Li, Yudong Liu, HaoNing Wu, Yue Wang, Zhiqi Shen, Bowen Qu, Xinyao Niu, Guoyin Wang, Bei Chen, Junnan Li
Information comes in diverse modalities.
no code implementations • 31 Jul 2024 • Yan Yang, Liyuan Pan, Dongxu Li, Liu Liu
Furthermore, to scale up the number of events and RGB data pairs for training, we also propose a pipeline for synthesizing event data from static RGB images.
1 code implementation • 22 Jul 2024 • HaoNing Wu, Dongxu Li, Bei Chen, Junnan Li
In addition, our results indicate that model performance on the benchmark improves only when they are capable of processing more frames, positioning LongVideoBench as a valuable benchmark for evaluating future-generation long-context LMMs.
2 code implementations • 16 Jun 2024 • Libo Wang, Dongxu Li, Sijun Dong, Xiaoliang Meng, Xiaokang Zhang, Danfeng Hong
Semantic segmentation, as a basic tool for intelligent interpretation of remote sensing images, plays a vital role in many Earth Observation (EO) applications.
2 code implementations • 3 Jan 2024 • David Junhao Zhang, Dongxu Li, Hung Le, Mike Zheng Shou, Caiming Xiong, Doyen Sahoo
This work presents Moonshot, a new video generation model that conditions simultaneously on multimodal inputs of image and text.
no code implementations • 2 Jan 2024 • Dongxu Li, Jianhao Huang, Chuan Huang, Xiaoqi Qin, Han Zhang, Ping Zhang
For the case with unknown semantic source distribution, while only a set of the source samples is available, we propose a neural-network-based method by leveraging the generative networks to learn the semantic source distribution.
2 code implementations • 30 Nov 2023 • Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles
To enable this framework, we devise a scalable pipeline that automatically generates high-quality, instruction-tuning datasets from readily available captioning data across different modalities, and contribute 24K QA data for audio and 250K QA data for 3D.
no code implementations • 18 Jul 2023 • Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong
Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.
1 code implementation • NeurIPS 2023 • Dongxu Li, Junnan Li, Steven C. H. Hoi
Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions.
Ranked #6 on Personalized Image Generation on DreamBooth
4 code implementations • NeurIPS 2023 • Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi
Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.
Ranked #5 on Visual Question Answering on BenchLMM
2 code implementations • 8 May 2023 • Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong
Sequence modeling has important applications in natural language processing and computer vision.
no code implementations • 27 Feb 2023 • Jianhao Huang, Dongxu Li, Chuan Huang, Xiaoqi Qin, Wei zhang
This paper proposes a deep separate source-channel coding (DSSCC) framework for the joint task and data oriented semantic communications (JTD-SC) and utilizes the variational autoencoder approach to solve the rate-distortion problem with semantic distortion.
17 code implementations • Conference 2023 • Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
Ranked #1 on Image-to-Text Retrieval on MS COCO
no code implementations • CVPR 2023 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven Hoi
To address this issue, we propose Img2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.
3 code implementations • 21 Dec 2022 • Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, DaCheng Tao, Steven C. H. Hoi
To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training.
no code implementations • 26 Nov 2022 • Yuquan Lan, Dongxu Li, Yunqi Zhang, Hui Zhao, Gang Zhao
To address the above issues, we propose a novel method named PCRED for ZeroRTE with Potential Candidate Relation Selection and Entity Boundary Detection.
1 code implementation • 19 Oct 2022 • Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong
In this paper, we examine existing kernel-based linear transformers and identify two key issues that lead to such performance gaps: 1) unbounded gradients in the attention computation adversely impact the convergence of linear transformer models; 2) attention dilution which trivially distributes attention scores over long sequences while neglecting neighbouring structures.
1 code implementation • 18 Sep 2022 • Kang Chen, Shaochen Wang, Beihao Xia, Dongxu Li, Zhen Kan, Bin Li
We observe that the global characteristics of the transformer make it easier to extract contextual information to perform depth estimation of transparent areas.
1 code implementation • 15 Sep 2022 • Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi
We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.
3 code implementations • ICLR 2022 • Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong
As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.
Ranked #4 on Offline RL on D4RL
9 code implementations • 28 Jan 2022 • Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi
Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.
Ranked #3 on Open Vocabulary Attribute Detection on OVAD-Box benchmark (using extra training data)
1 code implementation • CVPR 2022 • Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, Steven C. H. Hoi
To achieve this, we first introduce an entity prompter module, which is trained with VTC to produce the similarity between a video crop and text prompts instantiated with entity names.
Ranked #19 on Zero-Shot Video Retrieval on DiDeMo
1 code implementation • 17 Dec 2021 • Dongxu Li, Chenchen Xu, Liu Liu, Yiran Zhong, Rong Wang, Lars Petersson, Hongdong Li
This work studies the task of glossification, of which the aim is to em transcribe natural spoken language sentences for the Deaf (hard-of-hearing) community to ordered sign language glosses.
1 code implementation • 11 Oct 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Jingyu Liu, Jiankang Deng, Wei Liu, Stefanos Zafeiriou
It is thus unclear how these algorithms perform on public face hallucination datasets.
Ranked #1 on Image Super-Resolution on WLFW
1 code implementation • 23 Mar 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Wei Liu
Video deraining is an important task in computer vision as the unwanted rain hampers the visibility of videos and deteriorates the robustness of most outdoor vision systems.
no code implementations • 12 Mar 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren
In addition, to further refine the result, a Differential-driven Dual Attention-in-Attention Model (D-DAiAM) is proposed with a "heavy-to-light" scheme to remove rain via addressing the unsatisfying deraining regions.
no code implementations • CVPR 2021 • Dongxu Li, Chenchen Xu, Kaihao Zhang, Xin Yu, Yiran Zhong, Wenqi Ren, Hanna Suominen, Hongdong Li
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
no code implementations • ICCV 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang
Increasingly, modern mobile devices allow capturing images at Ultra-High-Definition (UHD) resolution, which includes 4K and 8K images.
2 code implementations • NeurIPS 2020 • Dongxu Li, Chenchen Xu, Xin Yu, Kaihao Zhang, Ben Swift, Hanna Suominen, Hongdong Li
Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences.
no code implementations • CVPR 2020 • Dongxu Li, Xin Yu, Chenchen Xu, Lars Petersson, Hongdong Li
To this end, we extract news signs using a base WSLR model, and then design a classifier jointly trained on news and isolated signs to coarsely align these two domain features.
3 code implementations • 24 Oct 2019 • Dongxu Li, Cristian Rodriguez Opazo, Xin Yu, Hongdong Li
Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios.
Ranked #4 on Sign Language Recognition on WLASL100