Unifying Vision, Text, and Layout for Universal Document Processing

microsoft/udop 5 Dec 2022

UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.

Document AI Image Reconstruction

SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation

fudan-zvg/seaformer 30 Jan 2023

Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K and Cityscapes datasets.

Image Classification Semantic Segmentation

PhyCV: The First Physics-inspired Computer Vision Library

jalalilabucla/phycv 29 Jan 2023

PhyCV is the first computer vision library which utilizes algorithms directly derived from the equations of physics governing physical phenomena.

Text2LIVE: Text-Driven Layered Image and Video Editing

omerbt/Text2LIVE 5 Apr 2022

Given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e. g., object's texture) or augment the scene with visual effects (e. g., smoke, fire) in a semantically meaningful manner.

Fine-Tuning Language Models from Human Preferences

lvwerra/trl 18 Sep 2019

Most work on reward learning has used simulated environments, but complex information about values is often expressed in natural language, and we believe reward learning for language is a key to making RL practical and safe for real-world tasks.

Language Modelling

ProGen2: Exploring the Boundaries of Protein Language Models

salesforce/progen 27 Jun 2022

Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial intelligence-driven protein design.

High Fidelity Neural Audio Compression

facebookresearch/encodec 24 Oct 2022

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks.

Audio Signal Processing

AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

automatic1111/stable-diffusion-webui 12 Nov 2022

In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model.

Contrastive Learning Cross-Modal Retrieval +10

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

hazyresearch/h3 28 Dec 2022

First, we use synthetic language modeling tasks to understand the gap between SSMs and attention.

Few-Shot Learning Language Modelling

K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

sarafridov/k-planes 24 Jan 2023

We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions.

Novel View Synthesis

