InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation

gnobitab/instaflow 12 Sep 2023

Leveraging our new pipeline, we create, to the best of our knowledge, the first one-step diffusion-based text-to-image generator with SD-level image quality, achieving an FID (Frechet Inception Distance) of $23. 3$ on MS COCO 2017-5k, surpassing the previous state-of-the-art technique, progressive distillation, by a significant margin ($37. 2$ $\rightarrow$ $23. 3$ in FID).

550
0.99 stars / hour

OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch

opennlg/openba 19 Sep 2023

This report provides the main details to pre-train an analogous model, including pre-training data processing, Bilingual Flan data collection, the empirical observations that inspire our model architecture design, training objectives of different stages, and other enhancement techniques.

48
0.94 stars / hour

3D Gaussian Splatting for Real-Time Radiance Field Rendering

graphdeco-inria/gaussian-splatting 8 Aug 2023

Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos.

Camera Calibration Novel View Synthesis

4,213
0.93 stars / hour

Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications

zhudotexe/kani 11 Sep 2023

Language model applications are becoming increasingly popular and complex, often including features like tool usage and retrieval augmentation.

Language Modelling Management +1

209
0.92 stars / hour

DreamLLM: Synergistic Multimodal Comprehension and Creation

RunpeiDong/DreamLLM 20 Sep 2023

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

44
0.79 stars / hour

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

projectnuwa/layoutnuwa 18 Sep 2023

Graphic layout generation, a growing research field, plays a significant role in user engagement and information perception.

Code Completion Code Generation

58
0.79 stars / hour

GPT Can Solve Mathematical Problems Without a Calculator

thudm/mathglm 6 Sep 2023

Previous studies have typically assumed that large language models are unable to accurately perform arithmetic operations, particularly multiplication of >8 digits, and operations involving decimals and fractions, without the use of calculator tools.

Language Modelling

176
0.73 stars / hour

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

haozhezhao/mic 14 Sep 2023

Specifically, the current VLMs primarily emphasize utilizing multi-modal data with a single image some, rather than multi-modal prompts with interleaved multiple images and text.

Language Modelling

165
0.67 stars / hour

EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation

mit-han-lab/efficientvit 29 May 2022

Unlike prior semantic segmentation models that rely on heavy self-attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our lightweight multi-scale attention achieves a global receptive field and multi-scale learning (two critical features for semantic segmentation models) with only lightweight and hardware-efficient operations.

Autonomous Driving Image Classification +3

369
0.66 stars / hour

Petals: Collaborative Inference and Fine-tuning of Large Models

bigscience-workshop/petals 2 Sep 2022

However, these techniques have innate limitations: offloading is too slow for interactive inference, while APIs are not flexible enough for research that requires access to weights, attention or logits.

Collaborative Inference

7,517
0.59 stars / hour