Search Results for author: Jie Lei

Found 27 papers, 15 papers with code

Transcoded Video Restoration by Temporal Spatial Auxiliary Network

1 code implementation15 Dec 2021 Li Xu, Gang He, Jinjia Zhou, Jie Lei, Weiying Xie, Yunsong Li, Yu-Wing Tai

In most video platforms, such as Youtube, and TikTok, the played videos usually have undergone multiple video encodings such as hardware encoding by recording devices, software encoding by video editing apps, and single/multiple video transcoding by video application servers.

Frame Video Restoration

Detecting Moments and Highlights in Videos via Natural Language Queries

1 code implementation NeurIPS 2021 Jie Lei, Tamara Berg, Mohit Bansal

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Moment Retrieval

Boundary Knowledge Translation based Reference Semantic Segmentation

no code implementations1 Aug 2021 Lechao Cheng, Zunlei Feng, Xinchao Wang, Ya Jie Liu, Jie Lei, Mingli Song

In this paper, we introduce a novel Reference semantic segmentation Network (Ref-Net) to conduct visual boundary knowledge translation.

Semantic Segmentation Translation

Edge-competing Pathological Liver Vessel Segmentation with Limited Labels

1 code implementation1 Aug 2021 Zunlei Feng, Zhonghua Wang, Xinchao Wang, Xiuming Zhang, Lechao Cheng, Jie Lei, Yuexuan Wang, Mingli Song

The diagnosis of MVI needs discovering the vessels that contain hepatocellular carcinoma cells and counting their number in each vessel, which depends heavily on experiences of the doctor, is largely subjective and time-consuming.

whole slide images

MTVR: Multilingual Moment Retrieval in Videos

1 code implementation ACL 2021 Jie Lei, Tamara L. Berg, Mohit Bansal

We introduce mTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21. 8K TV show video clips.

Moment Retrieval

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

2 code implementations20 Jul 2021 Jie Lei, Tamara L. Berg, Mohit Bansal

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Moment Retrieval

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

1 code implementation21 Jun 2021 Hao Tan, Jie Lei, Thomas Wolf, Mohit Bansal

Unlike language, where the text tokens are more independent, neighboring video tokens typically have strong correlations (e. g., consecutive video frames usually look very similar), and hence uniformly masking individual tokens will make the task too trivial to learn useful representations.

Action Classification Action Recognition +2

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

no code implementations ICCV 2021 Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu

We hope our Adversarial VQA dataset can shed new light on robustness study in the community and serve as a valuable benchmark for future work.

Data Augmentation Question Answering +2

DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization

1 code implementation NAACL 2021 Zineng Tang, Jie Lei, Mohit Bansal

Second, to alleviate the temporal misalignment issue, our method incorporates an entropy minimization-based constrained attention loss, to encourage the model to automatically focus on the correct caption from a pool of candidate ASR captions.

Question Answering Text to Video Retrieval +3

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

1 code implementation CVPR 2021 Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu

Experiments on text-to-video retrieval and video question answering on six datasets demonstrate that ClipBERT outperforms (or is on par with) existing methods that exploit full-length videos, suggesting that end-to-end learning with just a few sparsely sampled clips is often more accurate than using densely extracted offline features from full-length videos, proving the proverbial less-is-more principle.

Ranked #4 on Visual Question Answering on MSRVTT-QA (using extra training data)

Question Answering Text to Video Retrieval +3

Unifying Vision-and-Language Tasks via Text Generation

1 code implementation4 Feb 2021 Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal

On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models.

Conditional Text Generation Image Captioning +7

Sparse Coding-inspired GAN for Weakly Supervised Hyperspectral Anomaly Detection

no code implementations1 Jan 2021 Tao Jiang, Weiying Xie, Jie Lei, Yunsong Li, Zan Li

For solving these problems, this paper proposes a sparse coding-inspired generative adversarial network (GAN) for weakly supervised HAD, named sparseHAD.

Anomaly Detection

MCM-aware Twin-least-square GAN for Hyperspectral Anomaly Detection

no code implementations1 Jan 2021 Jiaping Zhong, Weiying Xie, Jie Lei, Yunsong Li, Zan Li

Hyperspectral anomaly detection under high-dimensional data and interference of deteriorated bands without any prior information has been challenging and attracted close attention in the exploration of the unknown in real scenarios.

Anomaly Detection

One-sample Guided Object Representation Disassembling

no code implementations NeurIPS 2020 Zunlei Feng, Yongming He, Xinchao Wang, Xin Gao, Jie Lei, Cheng Jin, Mingli Song

In this paper, we introduce the One-sample Guided Object Representation Disassembling (One-GORD) method, which only requires one annotated sample for each object category to learn disassembled object representation from unannotated images.

Data Augmentation Image Classification

What is More Likely to Happen Next? Video-and-Language Future Event Prediction

1 code implementation EMNLP 2020 Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

Given a video with aligned dialogue, people can often infer what is more likely to happen next.

A Novel Multi-Step Finite-State Automaton for Arbitrarily Deterministic Tsetlin Machine Learning

no code implementations4 Jul 2020 K. Darshana Abeyrathna, Ole-Christoffer Granmo, Rishad Shafik, Alex Yakovlev, Adrian Wheeldon, Jie Lei, Morten Goodwin

However, TMs rely heavily on energy-costly random number generation to stochastically guide a team of Tsetlin Automata to a Nash Equilibrium of the TM game.

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

1 code implementation ACL 2020 Jie Lei, Li-Wei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph.

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

2 code implementations ECCV 2020 Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

The queries are also labeled with query types that indicate whether each of them is more related to video or subtitle or both, allowing for in-depth analysis of the dataset and the methods that built on top of it.

Moment Retrieval Video Corpus Moment Retrieval +1

TVQA+: Spatio-Temporal Grounding for Video Question Answering

3 code implementations ACL 2020 Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal

We present the task of Spatio-Temporal Video Question Answering, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced visual concepts (people and objects) to answer natural language questions about videos.

Question Answering Video Question Answering

TripletGAN: Training Generative Model with Triplet Loss

no code implementations14 Nov 2017 Gongze Cao, Yezhou Yang, Jie Lei, Cheng Jin, Yang Liu, Mingli Song

As an effective way of metric learning, triplet loss has been widely used in many deep learning tasks, including face recognition and person-ReID, leading to many states of the arts.

Face Recognition General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.