Search Results for author: Mingqian Tang

Found 24 papers, 10 papers with code

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

no code implementations • 2 Nov 2022 • Yixuan Pei, Zhiwu Qing, Jun Cen, Xiang Wang, Shiwei Zhang, Yaxiong Wang, Mingqian Tang, Nong Sang, Xueming Qian

The former is to reduce the memory cost by preserving only one condensed frame instead of the whole video, while the latter aims to compensate the lost spatio-temporal details in the Frame Condensing stage.

Action Recognition Class Incremental Learning +1

Paper
Add Code

Grow and Merge: A Unified Framework for Continuous Categories Discovery

no code implementations • 9 Oct 2022 • Xinwei Zhang, Jianwen Jiang, Yutong Feng, Zhi-Fan Wu, Xibin Zhao, Hai Wan, Mingqian Tang, Rong Jin, Yue Gao

Although a number of studies are devoted to novel category discovery, most of them assume a static setting where both labeled and unlabeled data are given at once for finding new categories.

Self-Supervised Learning

Paper
Add Code

RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection

3 code implementations • 5 Sep 2022 • Hangjie Yuan, Jianwen Jiang, Samuel Albanie, Tao Feng, Ziyuan Huang, Dong Ni, Mingqian Tang

The task of Human-Object Interaction (HOI) detection targets fine-grained visual parsing of humans interacting with their environment, enabling a broad range of applications.

Ranked #16 on Human-Object Interaction Detection on HICO-DET

Human-Object Interaction Detection Relation +1

Paper
Code

Open-world Semantic Segmentation for LIDAR Point Clouds

1 code implementation • 4 Jul 2022 • Jun Cen, Peng Yun, Shiwei Zhang, Junhao Cai, Di Luan, Michael Yu Wang, Ming Liu, Mingqian Tang

Current methods for LIDAR semantic segmentation are not robust enough for real-world applications, e. g., autonomous driving, since it is closed-set and static.

Autonomous Driving Incremental Learning +3

Paper
Code

Hybrid Relation Guided Set Matching for Few-shot Action Recognition

1 code implementation • CVPR 2022 • Xiang Wang, Shiwei Zhang, Zhiwu Qing, Mingqian Tang, Zhengrong Zuo, Changxin Gao, Rong Jin, Nong Sang

To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric.

Ranked #1 on Few Shot Action Recognition on Something-Something-100

Few Shot Action Recognition Relation +1

Paper
Code

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

no code implementations • CVPR 2022 • Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yi Xu, Xiang Wang, Mingqian Tang, Changxin Gao, Rong Jin, Nong Sang

In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos.

Contrastive Learning Representation Learning +1

Paper
Add Code

TAda! Temporally-Adaptive Convolutions for Video Understanding

2 code implementations • ICLR 2022 • Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Mingqian Tang, Ziwei Liu, Marcelo H. Ang Jr

This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modelling complex temporal dynamics in videos.

Ranked #67 on Action Recognition on Something-Something V2 (using extra training data)

Action Classification Action Recognition +2

215

Paper
Code

Rethinking Supervised Pre-training for Better Downstream Transferring

no code implementations • ICLR 2022 • Yutong Feng, Jianwen Jiang, Mingqian Tang, Rong Jin, Yue Gao

Though for most cases, the pre-training stage is conducted based on supervised methods, recent works on self-supervised pre-training have shown powerful transferability and even outperform supervised pre-training on multiple downstream tasks.

Open-Ended Question Answering

Paper
Add Code

NGC: A Unified Framework for Learning with Open-World Noisy Data

no code implementations • ICCV 2021 • Zhi-Fan Wu, Tong Wei, Jianwen Jiang, Chaojie Mao, Mingqian Tang, Yu-Feng Li

The existence of noisy data is prevalent in both the training and testing phases of machine learning systems, which inevitably leads to the degradation of model performance.

Ranked #18 on Image Classification on mini WebVision 1.0

Image Classification

Paper
Add Code

Support-Set Based Cross-Supervision for Video Grounding

no code implementations • ICCV 2021 • Xinpeng Ding, Nannan Wang, Shiwei Zhang, De Cheng, Xiaomeng Li, Ziyuan Huang, Mingqian Tang, Xinbo Gao

The contrastive objective aims to learn effective representations by contrastive learning, while the caption objective can train a powerful video encoder supervised by texts.

Contrastive Learning Video Grounding

Paper
Add Code

ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning

1 code implementation • 24 Aug 2021 • Zhiwu Qing, Ziyuan Huang, Shiwei Zhang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Rong Jin, Nong Sang

The visualizations show that ParamCrop adaptively controls the center distance and the IoU between two augmented views, and the learned change in the disparity along the training process is beneficial to learning a strong representation.

Contrastive Learning

215

Paper
Code

Video Similarity and Alignment Learning on Partial Video Copy Detection

no code implementations • 4 Aug 2021 • Zhen Han, Xiangteng He, Mingqian Tang, Yiliang Lv

To address the above issues, we propose the Video Similarity and Alignment Learning (VSAL) approach, which jointly models spatial similarity, temporal similarity and partial alignment.

Copy Detection Partial Video Copy Detection +1

Paper
Add Code

HANet: Hierarchical Alignment Networks for Video-Text Retrieval

1 code implementation • 26 Jul 2021 • Peng Wu, Xiangteng He, Mingqian Tang, Yiliang Lv, Jing Liu

Based on these, we naturally construct hierarchical representations in the individual-local-global manner, where the individual level focuses on the alignment between frame and word, local level focuses on the alignment between video clip and textual context, and global level focuses on the alignment between the whole video and text.

Retrieval Text Matching +3

Paper
Code

Exploring Stronger Feature for Temporal Action Localization

no code implementations • 24 Jun 2021 • Zhiwu Qing, Xiang Wang, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Nong Sang

Temporal action localization aims to localize starting and ending time with action category.

Temporal Action Localization

Paper
Add Code

Proposal Relation Network for Temporal Action Detection

1 code implementation • 20 Jun 2021 • Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Nong Sang

We calculate the detection results by assigning the proposals with corresponding classification results.

Ranked #2 on Temporal Action Localization on ActivityNet-1.3 (using extra training data)

Action Classification Action Detection +3

Paper
Code

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

no code implementations • 20 Jun 2021 • Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Yuanjie Shao, Nong Sang

Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels based on Multi-Instance Learning (MIL).

Weakly-supervised Learning Weakly-supervised Temporal Action Localization +1

Paper
Add Code

Relation Modeling in Spatio-Temporal Action Localization

no code implementations • 15 Jun 2021 • Yutong Feng, Jianwen Jiang, Ziyuan Huang, Zhiwu Qing, Xiang Wang, Shiwei Zhang, Mingqian Tang, Yue Gao

This paper presents our solution to the AVA-Kinetics Crossover Challenge of ActivityNet workshop at CVPR 2021.

Ranked #4 on Spatio-Temporal Action Localization on AVA-Kinetics (using extra training data)

Action Detection Relation +2

Paper
Add Code

A Stronger Baseline for Ego-Centric Action Detection

1 code implementation • 13 Jun 2021 • Zhiwu Qing, Ziyuan Huang, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Nong Sang

This technical report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR2021 Workshop.

Action Detection

215

Paper
Code

Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition

1 code implementation • 9 Jun 2021 • Ziyuan Huang, Zhiwu Qing, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Zhurong Xia, Mingqian Tang, Nong Sang, Marcelo H. Ang Jr

In this paper, we present empirical results for training a stronger video vision transformer on the EPIC-KITCHENS-100 Action Recognition dataset.

Action Recognition Point Cloud Classification +1

215

Paper
Code

Self-supervised Video Retrieval Transformer Network

no code implementations • 16 Apr 2021 • Xiangteng He, Yulin Pan, Mingqian Tang, Yiliang Lv

In addition, most retrieval systems are based on frame-level features for video similarity searching, making it expensive both storage wise and search wise.

Retrieval Self-supervised Video Retrieval +2

Paper
Add Code

Self-supervised Motion Learning from Static Images

1 code implementation • CVPR 2021 • Ziyuan Huang, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Rong Jin, Marcelo Ang

We furthermore introduce a static mask in pseudo motions to create local motion patterns, which forces the model to additionally locate notable motion areas for the correct classification. We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.

Action Recognition Self-Supervised Learning

215

Paper
Code

Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

no code implementations • 1 Jan 2021 • Yuqi Huo, Mingyu Ding, Haoyu Lu, Zhiwu Lu, Tao Xiang, Ji-Rong Wen, Ziyuan Huang, Jianwen Jiang, Shiwei Zhang, Mingqian Tang, Songfang Huang, Ping Luo

With the constrained jigsaw puzzles, instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable but meanwhile still ensure that the learned representation is sensitive to spatiotemporal continuity at both the local and global levels.

Representation Learning

Paper
Add Code

Price Suggestion for Online Second-hand Items with Texts and Images

no code implementations • 10 Dec 2020 • Liang Han, Zhaozheng Yin, Zhurong Xia, Mingqian Tang, Rong Jin

The goal of price prediction is to help sellers set effective and reasonable prices for their second-hand items with the images and text descriptions uploaded to the online platforms.

Binary Classification regression

Paper
Add Code

Vision-based Price Suggestion for Online Second-hand Items

no code implementations • 10 Dec 2020 • Liang Han, Zhaozheng Yin, Zhurong Xia, Li Guo, Mingqian Tang, Rong Jin

Then, we design a vision-based price suggestion module which takes the extracted visual features along with some statistical item features from the shopping platform as the inputs to determine whether an uploaded item image is qualified for price suggestion by a binary classification model, and provide price suggestions for items with qualified images by a regression model.

Binary Classification Decision Making +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.