Large Language Model

248 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Large Language Model models and implementations
2 papers
916

Most implemented papers

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

salesforce/CodeGen 25 Mar 2022

To democratize this, we train and release a family of large language models up to 16. 1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER.

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

alibaba/AliceMind EMNLP 2021

Recent pretrained language models extend from millions to billions of parameters.

Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data

project-baize/baize-chatbot 3 Apr 2023

Furthermore, we propose a new technique called Self-Distill with Feedback, to further improve the performance of the Baize models with feedback from ChatGPT.

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

vision-cair/minigpt-4 20 Apr 2023

To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen LLM, Vicuna, using just one projection layer.

Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation

alipay/StructuredLM_RTDT 1 Mar 2022

We propose to use a top-down parser as a model-based pruning method, which also enables parallel encoding during inference.

Elixir: Train a Large Language Model on a Small GPU Cluster

hpcaitech/colossalai 10 Dec 2022

To reduce GPU memory usage, memory partitioning, and memory offloading have been proposed.

Emergent Analogical Reasoning in Large Language Models

taylorwwebb/emergent_analogies_llm 19 Dec 2022

In human cognition, this capacity is closely tied to an ability to reason by analogy.

Muse: Text-To-Image Generation via Masked Generative Transformers

lucidrains/muse-pytorch 2 Jan 2023

Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.

PaLM-E: An Embodied Multimodal Language Model

kyegomez/PALM-E 6 Mar 2023

Large language models excel at a wide range of complex tasks.

OpenICL: An Open-Source Framework for In-context Learning

shark-nlp/openicl 6 Mar 2023

However, the implementation of ICL is sophisticated due to the diverse retrieval and inference methods involved, as well as the varying pre-processing requirements for different models, datasets, and tasks.