VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

facebookresearch/fairseq EMNLP 2021

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks.

Action Segmentation Video Retrieval

16,980
8.25 stars / hour

PaddleNLP

PaddlePaddle/PaddleNLP 17 Sep 2019

Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.

Language Modelling Reading Comprehension

3,795
3.23 stars / hour

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

hongfz16/avatarclip 17 May 2022

Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation.

Language Modelling motion synthesis +1

167
2.50 stars / hour

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

hpcaitech/colossalai 28 Oct 2021

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.

2D Human Pose Estimation

3,136
2.19 stars / hour

Vision Transformer Adapter for Dense Predictions

czczup/vit-adapter 17 May 2022

When fine-tuning on downstream tasks, a modality-specific adapter is used to introduce the data and tasks' prior information into the model, making it suitable for these tasks.

Instance Segmentation Object Detection +1

46
0.83 stars / hour

Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

apple/ml-hierarchical-confusion-matrix 24 Oct 2021

The confusion matrix, a ubiquitous visualization for helping people evaluate machine learning models, is a tabular layout that compares predicted class labels against actual class labels over all data instances.

194
0.70 stars / hour

Towards An End-to-End Framework for Flow-Guided Video Inpainting

MCG-NKU/E2FGVI 6 Apr 2022

Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories.

Optical Flow Estimation Video Inpainting

214
0.69 stars / hour

A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation

spotify/basic-pitch 18 Mar 2022

Despite its simplicity, benchmark results show our system's note estimation to be substantially better than a comparable baseline, and its frame-level accuracy to be only marginally below those of specialized state-of-the-art AMT systems.

Music Transcription

99
0.63 stars / hour

Zero-Shot Text-to-Image Generation

borisdayma/dalle-mini 24 Feb 2021

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset.

Text to image generation Zero-Shot Text-to-Image Generation

1,387
0.61 stars / hour

The Primacy Bias in Deep Reinforcement Learning

evgenii-nikishin/rl_with_resets 16 May 2022

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.

Atari Games 100k reinforcement-learning

15
0.50 stars / hour