DeepCache: Accelerating Diffusion Models for Free

horseee/deepcache 1 Dec 2023

Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities.

Denoising Image Generation

216
0.79 stars / hour

NEFTune: Noisy Embeddings Improve Instruction Finetuning

openaccess-ai-collective/axolotl 9 Oct 2023

We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation.

Language Modelling

2,386
0.71 stars / hour

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

yuliang-liu/monkey 11 Nov 2023

Additionally, experiments on 18 datasets further demonstrate that Monkey surpasses existing LMMs in many tasks like Image Captioning and various Visual Question Answering formats.

Image Captioning Question Answering +2

606
0.69 stars / hour

GauHuman: Articulated Gaussian Splatting from Monocular Human Videos

skhu101/gauhuman 5 Dec 2023

We present, GauHuman, a 3D human model with Gaussian Splatting for both fast training (1 ~ 2 minutes) and real-time rendering (up to 189 FPS), compared with existing NeRF-based implicit representation modelling frameworks demanding hours of training and seconds of rendering per frame.

79
0.66 stars / hour

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

sh-lee-prml/hierspeechpp 21 Nov 2023

Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.

Speech Synthesis Super-Resolution +2

724
0.61 stars / hour

Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation

orhir/PoseAnything 29 Nov 2023

This approach not only enables object pose generation based on arbitrary keypoint definitions but also significantly reduces the associated costs, paving the way for versatile and adaptable pose estimation applications.

Animal Pose Estimation Category-Agnostic Pose Estimation +2

78
0.60 stars / hour

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

GongyeLiu/StyleCrafter 1 Dec 2023

To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image.

Disentanglement Text-to-Video Generation +1

104
0.58 stars / hour

Large Language Models on Graphs: A Comprehensive Survey

petergriffinjin/awesome-language-model-on-graphs 5 Dec 2023

Besides, although LLMs have shown their pure text-based reasoning ability, it is underexplored whether such ability can be generalized to graph scenarios (i. e., graph-based reasoning).

Language Modelling

156
0.56 stars / hour

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

epfllm/meditron 27 Nov 2023

Large language models (LLMs) can potentially democratize access to medical knowledge.

 Ranked #1 on Multiple Choice Question Answering (MCQA) on MedMCQA (Dev Set (Acc-%) metric)

Conditional Text Generation Multiple Choice Question Answering (MCQA)

1,142
0.56 stars / hour

Improving Sample Quality of Diffusion Models Using Self-Attention Guidance

lllyasviel/fooocus ICCV 2023

Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity.

Denoising Image Generation

24,733
0.54 stars / hour