OCR-free Document Understanding Transformer

clovaai/donut 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

Optical Character Recognition (OCR)

Conditional Image-to-Video Generation with Latent Flow Diffusion Models

nihaomiao/cvpr23_lfdm 24 Mar 2023

In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image.

Image to Video Generation Optical Flow Estimation

Training language models to follow instructions with human feedback

ggerganov/llama.cpp 4 Mar 2022

In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback.

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

ist-daslab/sparsegpt 2 Jan 2023

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy.

 Ranked #1 on Language Modelling on WikiText-2 (using extra training data)

Common Sense Reasoning Language Modelling +2

GPT-4 Technical Report

openai/evals Preprint 2023

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform

smazzanti/mrmr 15 Aug 2019

This paper describes the approach to extend, evaluate, and implement the mRMR feature selection methods for classification problem in a marketing machine learning platform at Uber that automates creation and deployment of targeting and personalization models at scale.

BIG-bench Machine Learning Marketing

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

winfredy/sadtalker 22 Nov 2022

We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.

Talking Head Generation

Anti-DreamBooth: Protecting users from personalized text-to-image synthesis

vinairesearch/anti-dreambooth 27 Mar 2023

Despite the complicated formulation of DreamBooth and Diffusion-based text-to-image models, our methods effectively defend users from the malicious use of those models.

Image Generation

Masked Diffusion Transformer is a Strong Image Synthesizer

sail-sg/mdt 25 Mar 2023

Despite its success in image synthesis, we observe that diffusion probabilistic models (DPMs) often lack contextual reasoning ability to learn the relations among object parts in an image, leading to a slow learning process.

Image Generation

Scaling Expert Language Models with Unsupervised Domain Discovery

kernelmachine/cbtm 24 Mar 2023

Large language models are typically trained densely: all parameters are updated with respect to all inputs.

Language Modelling

