BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

microsoft/biogpt 19 Oct 2022

Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain.

Document Classification Language Modelling +3

1,024
3.73 stars / hour

Multimodal Chain-of-Thought Reasoning in Language Models

amazon-science/mm-cot 2 Feb 2023

By incorporating the vision features in both stages, the model is able to generate effective rationales that contribute to answer inference.

208
2.64 stars / hour

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

AttendAndExcite/Attend-and-Excite 31 Jan 2023

Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.

Generative Semantic Nursing

196
1.60 stars / hour

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

salesforce/lavis 30 Jan 2023

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

Image Captioning Image Retrieval +5

1,830
0.88 stars / hour

STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

ucaszyp/steps 2 Feb 2023

By fitting a bridge-shaped curve to the illumination map distribution, both regions are suppressed and two tasks are bridged naturally.

Depth Estimation Image Enhancement

64
0.79 stars / hour

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

showlab/Tune-A-Video 22 Dec 2022

To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning.

Style Transfer Text-to-Video Generation +1

484
0.76 stars / hour

InstructPix2Pix: Learning to Follow Image Editing Instructions

timothybrooks/instruct-pix2pix 17 Nov 2022

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.

Language Modelling Text-based Image Editing +1

3,134
0.66 stars / hour

Towards Robust Blind Face Restoration with Codebook Lookup Transformer

sczhou/codeformer 22 Jun 2022

In this paper, we demonstrate that a learned discrete codebook prior in a small proxy space largely reduces the uncertainty and ambiguity of restoration mapping by casting blind face restoration as a code prediction task, while providing rich visual atoms for generating high-quality faces.

Blind Face Restoration

4,407
0.65 stars / hour

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

heatz123/naturalspeech 9 May 2022

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

Speech Synthesis Text-To-Speech Synthesis

55
0.64 stars / hour

Learning the Beauty in Songs: Neural Singing Voice Beautifier

MoonInTheRiver/DiffSinger ACL 2022

Furthermore, we propose a latent-mapping algorithm in the latent space to convert the amateur vocal tone to the professional one.

Dynamic Time Warping

1,880
0.64 stars / hour