Ivy: Templated Deep Learning for Inter-Framework Portability

ivy-dl/ivy 4 Feb 2021

We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

Text to image generation Text-to-Image Generation

A Conversational Paradigm for Program Synthesis

salesforce/CodeGen 25 Mar 2022

We train a family of large language models, called CodeGen, on natural language and programming language data.

Language Modelling Program Synthesis

In Defense of Online Models for Video Instance Segmentation

wjf5203/vnext 21 Jul 2022

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance.

Contrastive Learning Instance Segmentation +4

Masked Autoencoders that Listen

rishikksh20/AudioMAE-pytorch 13 Jul 2022

Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.

Audio Classification Representation Learning

Reconstructing 3D Human Pose by Watching Humans in the Mirror

zju3dv/EasyMocap CVPR 2021

In this paper, we introduce the new task of reconstructing 3D human pose from a single image in which we can see the person and the person's image through a mirror.

3D Pose Estimation

AvatarGen: a 3D Generative Model for Animatable Human Avatars

jfzhang95/avatargen 1 Aug 2022

Unsupervised generation of clothed virtual humans with various appearance and animatable poses is important for creating 3D human avatars and other AR/VR applications.

Package for Fast ABC-Boost

pltrees/abcboost 18 Jul 2022

Although the gain formula in Li (2010) was derived for logistic regression loss, it is a generic formula for loss functions with second-derivatives.

Multi-class Classification

OCR-free Document Understanding Transformer

clovaai/donut 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

Optical Character Recognition

