1 code implementation • Findings (EMNLP) 2021 • Minghan Li, Ming Li, Kun Xiong, Jimmy Lin
Our method reaches state-of-the-art performance on 5 benchmark QA datasets, with up to 10% improvement in top-100 accuracy compared to a joint-training multi-task DPR on SQuAD.
no code implementations • NAACL (TrustNLP) 2022 • Minghan Li, Xueguang Ma, Jimmy Lin
The bi-encoder design of dense passage retriever (DPR) is a key factor to its success in open-domain question answering (QA), yet it is unclear how DPR’s question encoder and passage encoder individually contributes to overall performance, which we refer to as the encoder attribution problem.
no code implementations • EMNLP 2021 • Xueguang Ma, Minghan Li, Kai Sun, Ji Xin, Jimmy Lin
Recent work has shown that dense passage retrieval techniques achieve better ranking accuracy in open-domain question answering compared to sparse retrieval techniques such as BM25, but at the cost of large space and memory requirements.
1 code implementation • 24 Mar 2025 • Chenxi Xie, Minghan Li, Hui Zeng, Jun Luo, Lei Zhang
MaSS13K features precise masks, with an average mask complexity 20-50 times higher than existing semantic segmentation datasets.
no code implementations • 18 Mar 2025 • Yifei Dong, Fengyi Wu, Qi He, Heng Li, Minghan Li, Zebang Cheng, Yuxuan Zhou, Jingdong Sun, Qi Dai, Zhi-Qi Cheng, Alexander G Hauptmann
Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone, overlooking the complexities of human-populated, dynamic environments.
no code implementations • 17 Mar 2025 • Minghan Li, Chenxi Xie, Yichen Wu, Lei Zhang, Mengyu Wang
To address this, we introduce FiVE, a Fine-grained Video Editing Benchmark for evaluating emerging diffusion and rectified flow models.
no code implementations • 28 Jan 2025 • Minghan Li, Eric Gaussier, Guodong Zhou
In recent years, large language models (LLMs) have demonstrated exceptional power in various domains, including information retrieval.
no code implementations • 9 Dec 2024 • Jiazhao Zhang, Kunyu Wang, Shaoan Wang, Minghan Li, Haoran Liu, Songlin Wei, Zhongyuan Wang, Zhizheng Zhang, He Wang
A practical navigation agent must be capable of handling a wide range of interaction demands, such as following instructions, searching objects, answering questions, tracking people, and more.
no code implementations • 9 Nov 2024 • Minghan Li, Eric Gaussier, Juntao Li, Guodong Zhou
Comprehensive experiments on long-document datasets, including TREC 2019 DL, Robust04, and MLDR-zh, show that KeyB2 outperforms baselines like RankLLaMA and KeyB by reducing reranking time and GPU memory usage while enhancing retrieval performance, achieving new SOTA results on TREC 2019 DL with higher NDCG@10 and MAP scores.
1 code implementation • 19 Oct 2024 • Chaodong Xiao, Minghan Li, Zhengqiang Zhang, Deyu Meng, Lei Zhang
Selective state space models (SSMs), such as Mamba, highly excel at capturing long-range dependencies in 1D sequential data, while their applications to 2D vision tasks still face challenges.
1 code implementation • 20 Aug 2024 • Zebang Cheng, Shuyuan Tu, Dawei Huang, Minghan Li, Xiaojiang Peng, Zhi-Qi Cheng, Alexander G. Hauptmann
This paper presents our winning approach for the MER-NOISE and MER-OV tracks of the MER2024 Challenge on multimodal emotion recognition.
1 code implementation • 27 Jun 2024 • Heng Li, Minghan Li, Zhi-Qi Cheng, Yifei Dong, Yuxuan Zhou, Jun-Yan He, Qi Dai, Teruko Mitamura, Alexander G. Hauptmann
Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions.
no code implementations • 17 Jun 2024 • Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin
To this end, we propose Document Screenshot Embedding (DSE), a novel retrieval paradigm that regards document screenshots as a unified input format, which does not require any content extraction preprocess and preserves all the information in a document (e. g., text, image and layout).
no code implementations • 29 May 2024 • Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin
Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations.
no code implementations • 13 Mar 2024 • Minghan Li, Eric Gaussier
Experiments on standard dense retrieval and conversational dense retrieval models both demonstrate improvements on baseline models when they are fine-tuned on the pseudo-relevance labeled data.
1 code implementation • CVPR 2024 • Minghan Li, Shuai Li, Xindong Zhang, Lei Zhang
Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.
Ranked #2 on
Video Semantic Segmentation
on VSPW
(using extra training data)
no code implementations • 10 Dec 2023 • Shuai Li, Minghan Li, Pengfei Wang, Lei Zhang
To address these challenges, we present a universal transformer-based framework, abbreviated as OpenSD, which utilizes the same architecture and network parameters to handle open-vocabulary segmentation and detection tasks.
Ranked #8 on
Zero Shot Segmentation
on Segmentation in the Wild
no code implementations • 15 Nov 2023 • Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, Michael Bendersky
In this paper, we re-examine this conclusion and raise the following question: Can query expansion improve generalization of strong cross-encoder rankers?
1 code implementation • 26 Mar 2023 • Minghan Li, Lei Zhang
As a result, the amount of pixel-wise annotations in existing video instance segmentation (VIS) datasets is small, limiting the generalization capability of trained VIS models.
Ranked #18 on
Video Instance Segmentation
on YouTube-VIS 2021
1 code implementation • CVPR 2023 • Minghan Li, Shuai Li, Wangmeng Xiang, Lei Zhang
The proposed MDQE is the first VIS method with per-clip input that achieves state-of-the-art results on challenging videos and competitive performance on simple videos.
1 code implementation • CVPR 2023 • Shuai Li, Minghan Li, Ruihuang Li, Chenhang He, Lei Zhang
The positive and negative weights of these soft anchors are dynamically adjusted during training so that they can contribute more to ``representation learning'' in the early training stage, and contribute more to ``duplicated prediction removal'' in the later stage.
1 code implementation • 15 Feb 2023 • Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen
We hence propose a new DA approach with diverse queries and sources of supervision to progressively train a generalizable DR. As a result, DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations and even competes with models using more complex late interaction (ColBERTv2 and SPLADE++).
Ranked #2 on
Passage Retrieval
on PeerQA
1 code implementation • 13 Feb 2023 • Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin
Multi-vector retrieval methods have demonstrated their effectiveness on various retrieval datasets, and among them, ColBERT is the most established method based on the late interaction of contextualized token embeddings of pre-trained language models.
no code implementations • 13 Feb 2023 • Xinyu Zhang, Minghan Li, Jimmy Lin
Recent progress in information retrieval finds that embedding query and document representation into multi-vector yields a robust bi-encoder retriever on out-of-distribution datasets.
no code implementations • 13 Dec 2022 • Minghan Li, Eric Gaussier
To address this issue, researchers have resorted to adversarial learning and query generation approaches; both approaches nevertheless resulted in limited improvements.
1 code implementation • 18 Nov 2022 • Minghan Li, Sheng-Chieh Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen
In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
no code implementations • 13 Oct 2022 • Linqing Liu, Minghan Li, Jimmy Lin, Sebastian Riedel, Pontus Stenetorp
To balance these two considerations, we propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
1 code implementation • 31 Jul 2022 • Sheng-Chieh Lin, Minghan Li, Jimmy Lin
Pre-trained language models have been successful in many knowledge-intensive NLP tasks.
1 code implementation • 19 May 2022 • Minghan Li, Xinyu Zhang, Ji Xin, Hongyang Zhang, Jimmy Lin
For example, on MS MARCO Passage v1, our method yields an average candidate set size of 27 out of 1, 000 which increases the reranking speed by about 37 times, while the MRR@10 is greater than a pre-specified value of 0. 38 with about 90% empirical coverage and the empirical baselines fail to provide such guarantee.
2 code implementations • CVPR 2022 • Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, Lei Zhang
In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space.
Ranked #7 on
Style Transfer
on StyleBench
no code implementations • 12 Mar 2022 • Minghan Li, Lei Zhang
Based on the fact that adjacent frames in a short clip are highly coherent in content, we propose to extend the one-stage FiFo framework to a clip-in clip-out (CiCo) one, which performs VIS clip by clip.
1 code implementation • 18 Nov 2021 • Minghan Li, Diana Nicoleta Popa, Johan Chagnon, Yagmur Gizem Cinar, Eric Gaussier
On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness.
no code implementations • 4 Oct 2021 • Minghan Li, Jimmy Lin
Previous work on generalization of DPR mainly focus on testing both encoders in tandem on out-of-distribution (OOD) question-answering (QA) tasks, which is also known as domain adaptation.
1 code implementation • 3 May 2021 • Thibaut Thonet, Yagmur Gizem Cinar, Eric Gaussier, Minghan Li, Jean-Michel Renders
To address this shortcoming, we propose SmoothI, a smooth approximation of rank indicators that serves as a basic building block to devise differentiable approximations of IR metrics.
1 code implementation • CVPR 2021 • Minghan Li, Shuai Li, Lida Li, Lei Zhang
To further explore temporal correlation among video frames, we aggregate a temporal fusion module to infer instance masks from each frame to its adjacent frames, which helps our framework to handle challenging videos such as motion blur, partial occlusion and unusual object-to-camera poses.
Ranked #26 on
Video Instance Segmentation
on YouTube-VIS 2021
no code implementations • 31 Jul 2020 • Minghan Li, Xialei Liu, Joost Van de Weijer, Bogdan Raducanu
Active learning emerged as an alternative to alleviate the effort to label huge amount of data for data hungry applications (such as image/video indexing and retrieval, autonomous driving, etc.).
1 code implementation • 18 Sep 2019 • Hong Wang, Yichen Wu, Minghan Li, Qian Zhao, Deyu Meng
The investigations on rain removal from video or a single image has thus been attracting much research attention in the field of computer vision and pattern recognition, and various methods have been proposed against this task in the recent years.
1 code implementation • 13 Sep 2019 • Minghan Li, Xiangyong Cao, Qian Zhao, Lei Zhang, Chenqiang Gao, Deyu Meng
Furthermore, a transformation operator imposed on the background scenes is further embedded into the proposed model, which finely conveys the dynamic background transformations, such as rotations, scalings and distortions, inevitably existed in a real video sequence.
no code implementations • 3 Dec 2018 • Minghan Li, Tanli Zuo, Ruicheng Li, Martha White, Wei-Shi Zheng
Knowledge distillation is an effective technique that transfers knowledge from a large teacher model to a shallow student.
no code implementations • CVPR 2018 • Minghan Li, Qi Xie, Qian Zhao, Wei Wei, Shuhang Gu, Jing Tao, Deyu Meng
Based on such understanding, we specifically formulate both characteristics into a multiscale convolutional sparse coding (MS-CSC) model for the video rain streak removal task.