Trending Research

The Unreasonable Effectiveness of Eccentric Automatic Prompts

stanfordnlp/dspy • • 9 Feb 2024

Given the combinatorial complexity, and thus computation time, of experimenting with hand-tuning prompts for large black-box models, we then compared the performance of the best "positive thinking" prompt against the output of systematic prompt optimization.

Ranked #104 on Arithmetic Reasoning on GSM8K

Arithmetic Reasoning GSM8K

10,099

0.27 stars / hour

Paper
Code

Video-Based Human Pose Regression via Decoupled Space-Time Aggregation

zgspose/dsta • • 29 Mar 2024

In light of this, we propose a novel Decoupled Space-Time Aggregation network (DSTA) to separately capture the spatial contexts between adjacent joints and the temporal cues of each individual joint, thereby avoiding the conflation of spatiotemporal dimensions.

Pose Estimation regression

0.26 stars / hour

Paper
Code

Pre-training Small Base LMs with Fewer Tokens

sanyalsunny111/llm-inheritune • • 12 Apr 2024

Here we show that smaller LMs trained utilizing some of the layers of GPT2-medium (355M) and GPT-2-large (770M) can effectively match the val loss of their bigger counterparts when trained from scratch for the same number of training steps on OpenWebText dataset with 9B tokens.

Language Modelling

0.25 stars / hour

Paper
Code

Training Compute-Optimal Large Language Models

karpathy/llama2.c • • 29 Mar 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

Ranked #1 on Common Sense Reasoning on BIG-bench (Logical Sequence)

Anachronisms Analogical Similarity +64

15,628

0.25 stars / hour

Paper
Code

Retrieval-Augmented Generation for AI-Generated Content: A Survey

hymie122/rag-survey • 29 Feb 2024

We first classify RAG foundations according to how the retriever augments the generator, distilling the fundamental abstractions of the augmentation methodologies for various retrievers and generators.

Information Retrieval Large Language Model +2

575

0.24 stars / hour

Paper
Code

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

MilesCranmer/PySR • • 2 May 2023

PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages.

Interpretable Machine Learning regression +1

1,850

0.24 stars / hour

Paper
Code

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

hustvl/vim • • 17 Jan 2024

The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to be the next-generation backbone for vision foundation models.

object-detection Object Detection +3

1,989

0.24 stars / hour

Paper
Code

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

graphpku/pissa • • 3 Apr 2024

However, LoRA approximates Delta W through the product of two matrices, A, initialized with Gaussian noise, and B, initialized with zeros, while PiSSA initializes A and B with principal singular values and vectors of the original matrix W. PiSSA can better approximate the outcomes of full-parameter fine-tuning at the beginning by changing the essential parts while freezing the "noisy" parts.

Quantization

109

0.24 stars / hour

Paper
Code

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

NVlabs/FoundationPose • • 13 Dec 2023

We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups.

3D Object Detection 3D Object Tracking +7

828

0.23 stars / hour

Paper
Code

VMamba: Visual State Space Model

mzeromiko/vmamba • • 18 Jan 2024

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have long been the predominant backbone networks for visual representation learning.

Computational Efficiency Representation Learning

1,368

0.22 stars / hour

Paper
Code