Trending Research

The Unreasonable Effectiveness of Eccentric Automatic Prompts

stanfordnlp/dspy • • 9 Feb 2024

Given the combinatorial complexity, and thus computation time, of experimenting with hand-tuning prompts for large black-box models, we then compared the performance of the best "positive thinking" prompt against the output of systematic prompt optimization.

Ranked #104 on Arithmetic Reasoning on GSM8K

Arithmetic Reasoning GSM8K

9,976

0.39 stars / hour

Paper
Code

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

zhengpeng7/birefnet • • 7 Jan 2024

It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).

Ranked #1 on RGB Salient Object Detection on HRSOD (using extra training data)

Camouflaged Object Segmentation Dichotomous Image Segmentation +3

124

0.39 stars / hour

Paper
Code

LLoCO: Learning Long Contexts Offline

jeffreysijuntan/lloco • • 11 Apr 2024

We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA.

4k In-Context Learning +1

0.38 stars / hour

Paper
Code

High-Fidelity Audio Compression with Improved RVQGAN

descriptinc/descript-audio-codec • • NeurIPS 2023

Language models have been successfully used to model natural signals, such as images, speech, and music.

Audio Compression Audio Generation +1

816

0.38 stars / hour

Paper
Code

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

idea-research/groundingdino • • 9 Mar 2023

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Ranked #1 on Zero-Shot Object Detection on MSCOCO

Referring Expression Referring Expression Comprehension +2

4,904

0.38 stars / hour

Paper
Code

VMamba: Visual State Space Model

mzeromiko/vmamba • • 18 Jan 2024

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have long been the predominant backbone networks for visual representation learning.

Computational Efficiency Representation Learning

1,335

0.37 stars / hour

Paper
Code

What Makes Good In-Context Examples for GPT-$3$?

stanfordnlp/dsp • • 17 Jan 2021

Inspired by the recent success of leveraging a retrieval module to augment large-scale neural network models, we propose to retrieve examples that are semantically-similar to a test sample to formulate its corresponding prompt.

Few-Shot Learning Natural Language Understanding +4

9,984

0.37 stars / hour

Paper
Code

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

InternLM/InternLM-XComposer • • 22 Feb 2024

We present DualFocus, a novel framework for integrating macro and micro perspectives within multi-modal large language models (MLLMs) to enhance vision-language task performance.

Hallucination

1,518

0.35 stars / hour

Paper
Code