RoFormer: Enhanced Transformer with Rotary Position Embedding

kingoflolz/mesh-transformer-jax 20 Apr 2021

We investigate various methods to encode positional information in transformer-based language models and propose a novel implementation named Rotary Position Embedding(RoPE).

Semantic Text Matching

SimSwap: An Efficient Framework For High Fidelity Face Swapping

neuralchen/SimSwap 11 Jun 2021

In contrast to previous approaches that either lack the ability to generalize to arbitrary identity or fail to preserve attributes like facial expression and gaze direction, our framework is capable of transferring the identity of an arbitrary source face into an arbitrary target face while preserving the attributes of the target face.

 Ranked #1 on Face Swapping on FaceForensics++ (ID retrieval metric)

Face Swapping

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

antoyang/just-ask 1 Dec 2020

In this work, we propose to avoid manual annotation and generate a large-scale training dataset for video question answering making use of automatic cross-modal supervision.

 Ranked #1 on Visual Question Answering on MSVD-QA (using extra training data)

Question Answering Question Generation +3

PVTv2: Improved Baselines with Pyramid Vision Transformer

whai362/PVT 25 Jun 2021

We hope this work will facilitate state-of-the-art Transformer researches in computer vision.

Image Classification Object Detection +1

You Do Not Need a Bigger Boat: Recommendations at Reasonable Scale in a (Mostly) Serverless and Open Stack

jacopotagliabue/metaflow-intent-prediction 15 Jul 2021

We argue that immature data pipelines are preventing a large portion of industry practitioners from leveraging the latest research on recommender systems.

Recommendation Systems

CLSRIL-23: Cross Lingual Speech Representations for Indic Languages

Open-Speech-EkStep/vakyansh-models 15 Jul 2021

We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages.

Self-Supervised Learning Speech Recognition

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

gradio-app/gradio 6 Jun 2019

Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks.

A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit

rafaelpadilla/review_object_detection_metrics 25 Jan 2021

Recent outstanding results of supervised object detection in competitions and challenges are often associated with specific metrics and datasets.

Object Detection

PaddleSeg: A High-Efficient Development Toolkit for Image Segmentation

PaddlePaddle/PaddleSeg 15 Jan 2021

The toolkit aims to help both developers and researchers in the whole process of designing segmentation models, training models, optimizing performance and inference speed, and deploying models.

Autonomous Driving Human Part Segmentation +2

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

PaddlePaddle/PaddleSeg 2 Jun 2016

ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales.

Semantic Segmentation

