CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

salesforce/codetf 31 May 2023

In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.

237
6.07 stars / hour

Gorilla: Large Language Model Connected with Massive APIs

ShishirPatil/gorilla 24 May 2023

Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis.

Language Modelling Mathematical Reasoning +2

2,988
5.57 stars / hour

Let's Verify Step by Step

openai/prm800k Preprint 2023

We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.

 Ranked #1 on Math Word Problem Solving on MATH minival (using extra training data)

Active Learning Math Word Problem Solving +1

621
4.89 stars / hour

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

mit-han-lab/llm-awq 1 Jun 2023

Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth).

Common Sense Reasoning Language Modelling +1

135
3.33 stars / hour

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

facebookresearch/hiera 1 Jun 2023

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

Video Recognition

116
3.10 stars / hour

Large Language Models as Tool Makers

ctlllll/llm-toolmaker 26 May 2023

Our approach consists of two key phases: 1) tool making: an LLM acts as the tool maker that crafts tools for given tasks, where a tool is implemented as a Python utility function.

608
3.05 stars / hour

Generating Sequences With Recurrent Neural Networks

sjvasquez/handwriting-synthesis 4 Aug 2013

This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time.

Language Modelling Text Generation

3,400
2.29 stars / hour

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

threestudio-project/threestudio 25 May 2023

In this work, we propose to model the 3D parameter as a random variable instead of a constant as in SDS and present variational score distillation (VSD), a principled particle-based variational framework to explain and address the aforementioned issues in text-to-3D generation.

Text to 3D

1,250
2.28 stars / hour

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

icoz69/styleavatar3d 30 May 2023

The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models.

260
2.13 stars / hour

Fine-Tuning Language Models with Just Forward Passes

princeton-nlp/mezo 27 May 2023

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.

Multiple-choice

262
1.80 stars / hour