VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

facebookresearch/fairseq EMNLP 2021

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks.

Action Segmentation Video Retrieval

16,968
8.13 stars / hour

PaddleNLP

PaddlePaddle/PaddleNLP ICLR 2021

Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.

Cross-Lingual Natural Language Inference Cross-Lingual NER +4

3,795
2.42 stars / hour

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

hpcaitech/colossalai 28 Oct 2021

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing.

2D Human Pose Estimation

3,003
1.78 stars / hour

A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation

spotify/basic-pitch 18 Mar 2022

Despite its simplicity, benchmark results show our system's note estimation to be substantially better than a comparable baseline, and its frame-level accuracy to be only marginally below those of specialized state-of-the-art AMT systems.

Music Transcription

99
0.79 stars / hour

Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels

apple/ml-hierarchical-confusion-matrix 24 Oct 2021

The confusion matrix, a ubiquitous visualization for helping people evaluate machine learning models, is a tabular layout that compares predicted class labels against actual class labels over all data instances.

185
0.65 stars / hour

OPT: Open Pre-trained Transformer Language Models

facebookresearch/metaseq 2 May 2022

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.

Hate Speech Detection Language Modelling +1

2,894
0.63 stars / hour

Thin-Plate Spline Motion Model for Image Animation

yoyo-nb/thin-plate-spline-motion-model 27 Mar 2022

Firstly, we propose thin-plate spline motion estimation to produce a more flexible optical flow, which warps the feature maps of the source image to the feature domain of the driving image.

Image Animation Motion Estimation +1

382
0.48 stars / hour

Towards An End-to-End Framework for Flow-Guided Video Inpainting

MCG-NKU/E2FGVI 6 Apr 2022

Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories.

Optical Flow Estimation Video Inpainting

214
0.46 stars / hour

HeadNeRF: A Real-time NeRF-based Parametric Head Model

crishy1995/headnerf 10 Dec 2021

Different from existing related parametric models, we use the neural radiance fields as a novel 3D proxy instead of the traditional 3D textured mesh, which makes that HeadNeRF is able to generate high fidelity images.

Neural Rendering

113
0.44 stars / hour

Goal-Guided Neural Cellular Automata: Learning to Control Self-Organising Systems

shyamsn97/controllable-ncas 25 Apr 2022

Inspired by cellular growth and self-organization, Neural Cellular Automata (NCAs) have been capable of "growing" artificial cells into images, 3D structures, and even functional machines.

21
0.40 stars / hour