Search Results for author: Jie Lei

Found 49 papers, 31 papers with code

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

1 code implementation • CVPR 2021 • Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu

Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms (or is on par with) existing methods that exploit full-length videos, suggesting that end-to-end learning with just a few sparsely sampled clips is often more accurate than using densely extracted offline features from full-length videos, proving the proverbial less-is-more principle.

Ranked #24 on Visual Question Answering (VQA) on MSRVTT-QA (using extra training data)

Question Answering Retrieval +4

685

Paper
Code

Revealing Single Frame Bias for Video-and-Language Learning

2 code implementations • 7 Jun 2022 • Jie Lei, Tamara L. Berg, Mohit Bansal

Training an effective video-and-language model intuitively requires multiple frames as model inputs.

Ranked #5 on Video Retrieval on SSv2-template retrieval (using extra training data)

Fine-grained Action Recognition Language Modelling +6

685

Paper
Code

Unifying Vision-and-Language Tasks via Text Generation

2 code implementations • 4 Feb 2021 • Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal

On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models.

Ranked #3 on Image Captioning on nocaps val

Conditional Text Generation Image Captioning +7

350

Paper
Code

SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery

1 code implementation • 27 Sep 2022 • Jiaqing Zhang, Jie Lei, Weiying Xie, Zhenman Fang, Yunsong Li, Qian Du

Furthermore, we design a simple and flexible SR branch to learn HR feature representations that can discriminate small objects from vast backgrounds with low-resolution (LR) input, thus further improving the detection accuracy.

Ranked #3 on Object Detection on VEDAI

Real-Time Object Detection Small Object Detection +1

241

Paper
Code

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

3 code implementations • 20 Jul 2021 • Jie Lei, Tamara L. Berg, Mohit Bansal

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Ranked #12 on Highlight Detection on QVHighlights

Highlight Detection Moment Retrieval +2

232

Paper
Code

Detecting Moments and Highlights in Videos via Natural Language Queries

1 code implementation • NeurIPS 2021 • Jie Lei, Tamara Berg, Mohit Bansal

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Ranked #6 on Video Grounding on QVHighlights

Moment Retrieval Natural Language Queries +2

232

Paper
Code

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

1 code implementation • ACL 2020 • Jie Lei, Li-Wei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph.

Ranked #5 on Video Captioning on ActivityNet Captions

Sentence

168

Paper
Code

TVQA: Localized, Compositional Video Question Answering

4 code implementations • EMNLP 2018 • Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg

Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks.

Ranked #4 on Video Question Answering on SUTD-TrafficQA

Video Question Answering

157

Paper
Code

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

2 code implementations • ECCV 2020 • Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

The queries are also labeled with query types that indicate whether each of them is more related to video or subtitle or both, allowing for in-depth analysis of the dataset and the methods that built on top of it.

Ranked #2 on Video Retrieval on TVR

Moment Retrieval Retrieval +2

147

Paper
Code

TVQA+: Spatio-Temporal Grounding for Video Question Answering

3 code implementations • ACL 2020 • Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

We present the task of Spatio-Temporal Video Question Answering, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced visual concepts (people and objects) to answer natural language questions about videos.

Ranked #6 on Video Question Answering on TVQA

Question Answering Video Question Answering

120

Paper
Code

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

1 code implementation • 22 May 2022 • Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.

Attribute Automatic Speech Recognition +6

110

Paper
Code

VindLU: A Recipe for Effective Video-and-Language Pretraining

1 code implementation • CVPR 2023 • Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius

Furthermore, our model also obtains state-of-the-art video question-answering results on ActivityNet-QA, MSRVTT-QA, MSRVTT-MC and TVQA.

Ranked #2 on Video Retrieval on Condensed Movies (using extra training data)

Question Answering Retrieval +3

Paper
Code

AFPN: Asymptotic Feature Pyramid Network for Object Detection

1 code implementation • 28 Jun 2023 • Guoyu Yang, Jie Lei, Zhikuan Zhu, Siyu Cheng, Zunlei Feng, Ronghua Liang

Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks.

Object object-detection +1

Paper
Code

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

1 code implementation • 8 Jun 2021 • Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu

Most existing video-and-language (VidL) research focuses on a single dataset, or multiple datasets of a single task.

Multi-Task Learning Question Answering +5

Paper
Code

Vision Transformers are Parameter-Efficient Audio-Visual Learners

1 code implementation • CVPR 2023 • Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius

To do so, we propose a latent audio-visual hybrid (LAVISH) adapter that adapts pretrained ViTs to audio-visual tasks by injecting a small number of trainable parameters into every layer of a frozen ViT.

Ranked #4 on Audio-visual Question Answering on MUSIC-AVQA

Audio-visual Question Answering

Paper
Code

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

1 code implementation • 21 Jun 2021 • Hao Tan, Jie Lei, Thomas Wolf, Mohit Bansal

Unlike language, where the text tokens are more independent, neighboring video tokens typically have strong correlations (e. g., consecutive video frames usually look very similar), and hence uniformly masking individual tokens will make the task too trivial to learn useful representations.

Ranked #10 on Action Recognition on Diving-48

Action Classification Action Recognition +2

Paper
Code

What is More Likely to Happen Next? Video-and-Language Future Event Prediction

1 code implementation • EMNLP 2020 • Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

Given a video with aligned dialogue, people can often infer what is more likely to happen next.

Paper
Code

Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention

1 code implementation • 21 Nov 2022 • Zineng Tang, Jaemin Cho, Jie Lei, Mohit Bansal

We present Perceiver-VL, a vision-and-language framework that efficiently handles high-dimensional multimodal inputs such as long videos and text.

Cross-Modal Retrieval Language Modelling +1

Paper
Code

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound

1 code implementation • 6 Apr 2022 • Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius

We introduce an audiovisual method for long-range text-to-video retrieval.

Retrieval Text to Video Retrieval +1

Paper
Code

MTVR: Multilingual Moment Retrieval in Videos

1 code implementation • ACL 2021 • Jie Lei, Tamara L. Berg, Mohit Bansal

We introduce mTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21. 8K TV show video clips.

Moment Retrieval Retrieval

Paper
Code

Guided Hybrid Quantization for Object detection in Multimodal Remote Sensing Imagery via One-to-one Self-teaching

1 code implementation • 31 Dec 2022 • Jiaqing Zhang, Jie Lei, Weiying Xie, Yunsong Li, Xiuping Jia

More concretely, we first design a structure called guided quantization self-distillation (GQSD), which is an innovative idea for realizing lightweight through the synergy of quantization and distillation.

Ranked #1 on Object Detection on VEDAI

object-detection Object Detection +1

Paper
Code

RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios

1 code implementation • NAACL (ACL) 2022 • Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji

We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios.

Event Extraction

Paper
Code

Transcoded Video Restoration by Temporal Spatial Auxiliary Network

1 code implementation • 15 Dec 2021 • Li Xu, Gang He, Jinjia Zhou, Jie Lei, Weiying Xie, Yunsong Li, Yu-Wing Tai

In most video platforms, such as Youtube, and TikTok, the played videos usually have undergone multiple video encodings such as hardware encoding by recording devices, software encoding by video editing apps, and single/multiple video transcoding by video application servers.

Video Editing Video Restoration

Paper
Code

Toward Stable, Interpretable, and Lightweight Hyperspectral Super-Resolution

1 code implementation • CVPR 2023 • Weiying Xie, Kai Jiang, Yunsong Li, Jie Lei, Leyuan Fang, Wen-jin Guo

Specifically, we create a positive cycle between fusion and degradation estimation under a new probabilistic framework.

Super-Resolution

Paper
Code

DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization

1 code implementation • NAACL 2021 • Zineng Tang, Jie Lei, Mohit Bansal

Second, to alleviate the temporal misalignment issue, our method incorporates an entropy minimization-based constrained attention loss, to encourage the model to automatically focus on the correct caption from a pool of candidate ASR captions.

Question Answering Retrieval +4

Paper
Code

Edge-competing Pathological Liver Vessel Segmentation with Limited Labels

1 code implementation • 1 Aug 2021 • Zunlei Feng, Zhonghua Wang, Xinchao Wang, Xiuming Zhang, Lechao Cheng, Jie Lei, Yuexuan Wang, Mingli Song

The diagnosis of MVI needs discovering the vessels that contain hepatocellular carcinoma cells and counting their number in each vessel, which depends heavily on experiences of the doctor, is largely subjective and time-consuming.

Segmentation whole slide images

Paper
Code

EfficientMFD: Towards More Efficient Multimodal Synchronous Fusion Detection

1 code implementation • 14 Mar 2024 • Jiaqing Zhang, Mingxiang Cao, Xue Yang, Weiying Xie, Jie Lei, Daixun Li, Geng Yang, Wenbo Huang, Yunsong Li

Multimodal image fusion and object detection play a vital role in autonomous driving.

Autonomous Driving object-detection +1

Paper
Code

Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image

1 code implementation • 6 Jan 2024 • Jiaqing Zhang, Jie Lei, Weiying Xie, Kai Jiang, Mingxiang Cao, Yunsong Li

Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research.

Domain Adaptation Specificity +1

Paper
Code

Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification

1 code implementation • 6 Jan 2024 • Jiaqing Zhang, Jie Lei, Weiying Xie, Geng Yang, Daixun Li, Yunsong Li

Additionally, the information distribution flow (IDF) in MIVit enhances performance-awareness by distributing global classification information across different modalities' feature maps.

Land Cover Classification

Paper
Code

Mid-level Representation Enhancement and Graph Embedded Uncertainty Suppressing for Facial Expression Recognition

1 code implementation • 27 Jul 2022 • Jie Lei, Zhao Liu, Zeyu Zou, Tong Li, Xu Juan, Shuaiwei Wang, Guoyu Yang, Zunlei Feng

On the other hand, GUS is introduced to suppress the feature ambiguity in the representation space.

Facial Expression Recognition Facial Expression Recognition (FER) +1

Paper
Code

Physics Inspired Criterion for Pruning-Quantization Joint Learning

1 code implementation • 1 Dec 2023 • Weiying Xie, Xiaoyi Fan, Xin Zhang, Yunsong Li, Jie Lei, Leyuan Fang

Pruning-quantization joint learning always facilitates the deployment of deep neural networks (DNNs) on resource-constrained edge devices.

Image Classification Model Compression +1

Paper
Code

TripletGAN: Training Generative Model with Triplet Loss

no code implementations • 14 Nov 2017 • Gongze Cao, Yezhou Yang, Jie Lei, Cheng Jin, Yang Liu, Mingli Song

As an effective way of metric learning, triplet loss has been widely used in many deep learning tasks, including face recognition and person-ReID, leading to many states of the arts.

Face Recognition General Classification +1

Paper
Add Code

Selective Zero-Shot Classification with Augmented Attributes

no code implementations • ECCV 2018 • Jie Song, Chengchao Shen, Jie Lei, An-Xiang Zeng, Kairi Ou, DaCheng Tao, Mingli Song

We propose a selective zero-shot classifier based on both the human defined and the automatically discovered residual attributes.

Attribute Classification +2

Paper
Add Code

A Novel Multi-Step Finite-State Automaton for Arbitrarily Deterministic Tsetlin Machine Learning

no code implementations • 4 Jul 2020 • K. Darshana Abeyrathna, Ole-Christoffer Granmo, Rishad Shafik, Alex Yakovlev, Adrian Wheeldon, Jie Lei, Morten Goodwin

However, TMs rely heavily on energy-costly random number generation to stochastically guide a team of Tsetlin Automata to a Nash Equilibrium of the TM game.

BIG-bench Machine Learning

Paper
Add Code

MCM-aware Twin-least-square GAN for Hyperspectral Anomaly Detection

no code implementations • 1 Jan 2021 • Jiaping Zhong, Weiying Xie, Jie Lei, Yunsong Li, Zan Li

Hyperspectral anomaly detection under high-dimensional data and interference of deteriorated bands without any prior information has been challenging and attracted close attention in the exploration of the unknown in real scenarios.

Anomaly Detection Generative Adversarial Network

Paper
Add Code

Sparse Coding-inspired GAN for Weakly Supervised Hyperspectral Anomaly Detection

no code implementations • 1 Jan 2021 • Tao Jiang, Weiying Xie, Jie Lei, Yunsong Li, Zan Li

For solving these problems, this paper proposes a sparse coding-inspired generative adversarial network (GAN) for weakly supervised HAD, named sparseHAD.

Anomaly Detection Generative Adversarial Network +1

Paper
Add Code

One-sample Guided Object Representation Disassembling

no code implementations • NeurIPS 2020 • Zunlei Feng, Yongming He, Xinchao Wang, Xin Gao, Jie Lei, Cheng Jin, Mingli Song

In this paper, we introduce the One-sample Guided Object Representation Disassembling (One-GORD) method, which only requires one annotated sample for each object category to learn disassembled object representation from unannotated images.

Data Augmentation Image Classification +1

Paper
Add Code

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

no code implementations • ICCV 2021 • Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu

We hope our Adversarial VQA dataset can shed new light on robustness study in the community and serve as a valuable benchmark for future work.

Data Augmentation Question Answering +1

Paper
Add Code

Boundary Knowledge Translation based Reference Semantic Segmentation

no code implementations • 1 Aug 2021 • Lechao Cheng, Zunlei Feng, Xinchao Wang, Ya Jie Liu, Jie Lei, Mingli Song

In this paper, we introduce a novel Reference semantic segmentation Network (Ref-Net) to conduct visual boundary knowledge translation.

Segmentation Semantic Segmentation +1

Paper
Add Code

Mutual-Complementing Framework for Nuclei Detection and Segmentation in Pathology Image

no code implementations • ICCV 2021 • Zunlei Feng, Zhonghua Wang, Xinchao Wang, Yining Mao, Thomas Li, Jie Lei, Yuexuan Wang, Mingli Song

The existing two unsupervised methods are prone to failure on degenerated samples.

Segmentation

Paper
Add Code

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval

no code implementations • 10 Mar 2022 • Jie Lei, Xinlei Chen, Ning Zhang, Mengjiao Wang, Mohit Bansal, Tamara L. Berg, Licheng Yu

In this work, we propose LoopITR, which combines them in the same network for joint learning.

Retrieval Text Retrieval

Paper
Add Code

CNN LEGO: Disassembling and Assembling Convolutional Neural Network

no code implementations • 25 Mar 2022 • Jiacong Hu, Jing Gao, Zunlei Feng, Lechao Cheng, Jie Lei, Hujun Bao, Mingli Song

the feature maps are adopted to locate the critical features in each layer.

Incremental Learning Knowledge Distillation +2

Paper
Add Code

Toward matrix multiplication for deep learning inference on the Xilinx Versal

no code implementations • 15 Feb 2023 • Jie Lei, José Flich, Enrique S. Quintana-Ortí

The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks has led to the development of various high performance algorithms as well as specialized processors and accelerators.

valid

Paper
Add Code

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

no code implementations • 10 Apr 2023 • Lin Chen, Zhijie Jia, Tian Qiu, Lechao Cheng, Jie Lei, Zunlei Feng, Mingli Song

In this work, we propose a new paradigm dubbed Decision Stream Calibration that boosts the performance of general Vision Transformers.

Paper
Add Code

A Loopback Network for Explainable Microvascular Invasion Classification

no code implementations • CVPR 2023 • Shengxuming Zhang, Tianqi Shi, Yang Jiang, Xiuming Zhang, Jie Lei, Zunlei Feng, Mingli Song

The loopback between two branches enables the category label to supervise the cell locating branch to learn the locating ability for cancerous areas.

Binary Classification Classification

Paper
Add Code

SAR-Net: Multi-scale Direction-aware SAR Network via Global Information Fusion

no code implementations • 28 Dec 2023 • Mingxiang Cao, Jie Lei, Weiying Xie, Jiaqing Zhang, Daixun Li, Yunsong Li

Deep learning has driven significant progress in object detection using Synthetic Aperture Radar (SAR) imagery.

object-detection Object Detection +1

Paper
Add Code

SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

no code implementations • 10 Jan 2024 • Jiayuan Tian, Jie Lei, Jiaqing Zhang, Weiying Xie, Yunsong Li

Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing.

Change Detection Contrastive Learning +2

Paper
Add Code

Angle Robustness Unmanned Aerial Vehicle Navigation in GNSS-Denied Scenarios

no code implementations • 4 Feb 2024 • Yuxin Wang, Zunlei Feng, Haofei Zhang, Yang Gao, Jie Lei, Li Sun, Mingli Song

Due to the inability to receive signals from the Global Navigation Satellite System (GNSS) in extreme conditions, achieving accurate and robust navigation for Unmanned Aerial Vehicles (UAVs) is a challenging task.

Paper
Add Code

SPAR: Personalized Content-Based Recommendation via Long Engagement Attention

no code implementations • 16 Feb 2024 • Chiyu Zhang, Yifei Sun, Jun Chen, Jie Lei, Muhammad Abdul-Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long

Leveraging users' long engagement histories is essential for personalized content recommendations.

Language Modelling Large Language Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.