Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

openai/transformer-debugger 1 Nov 2022

Research in mechanistic interpretability seeks to explain behaviors of machine learning models in terms of their internal components.

Language Modelling

3,853
0.29 stars / hour

DoRA: Weight-Decomposed Low-Rank Adaptation

NVlabs/DoRA 14 Feb 2024

By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead.

211
0.29 stars / hour

SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning

nzolman/sindy-rl 14 Mar 2024

Deep reinforcement learning (DRL) has shown significant promise for uncovering sophisticated control policies that interact in environments with complicated dynamics, such as stabilizing the magnetohydrodynamics of a tokamak fusion reactor or minimizing the drag force exerted on an object in a fluid flow.

Dictionary Learning Model-based Reinforcement Learning +1

43
0.27 stars / hour

MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

csyxwei/masterweaver 9 May 2024

In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability.

Text-to-Image Generation

79
0.27 stars / hour

Perseus: Removing Energy Bloat from Large Model Training

ml-energy/zeus 12 Dec 2023

Training large AI models on numerous GPUs consumes a massive amount of energy.

167
0.27 stars / hour

QLoRA: Efficient Finetuning of Quantized LLMs

internlm/xtuner NeurIPS 2023

Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99. 3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.

Chatbot Instruction Following +2

2,768
0.26 stars / hour

EasySpider: A No-Code Visual System for Crawling the Web

NaiboWang/EasySpider ACM The Web Conference 2023

As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research.

Data Integration Marketing

24,708
0.26 stars / hour

UniTable: Towards a Unified Framework for Table Structure Recognition via Self-Supervised Pretraining

poloclub/unitable 7 Mar 2024

Tables convey factual and quantitative data with implicit conventions created by humans that are often challenging for machines to parse.

Language Modelling

183
0.26 stars / hour

HMT: Hierarchical Memory Transformer for Long Context Language Processing

OswaldHe/HMT-pytorch 9 May 2024

With an additional 0. 5% - 2% of parameters, HMT can easily plug in and augment future LLMs to handle long context effectively.

Language Modelling Memorization +1

33
0.25 stars / hour

Zero-Shot Tokenizer Transfer

bminixhofer/zett 13 May 2024

Finally, we show that a ZeTT hypernetwork trained for a base (L)LM can also be applied to fine-tuned variants without extra training.

XLM-R

79
0.25 stars / hour