The Unreasonable Effectiveness of Eccentric Automatic Prompts

stanfordnlp/dspy 9 Feb 2024

Given the combinatorial complexity, and thus computation time, of experimenting with hand-tuning prompts for large black-box models, we then compared the performance of the best "positive thinking" prompt against the output of systematic prompt optimization.

Arithmetic Reasoning GSM8K

9,976
0.39 stars / hour

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

zhengpeng7/birefnet 7 Jan 2024

It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).

 Ranked #1 on RGB Salient Object Detection on HRSOD (using extra training data)

Camouflaged Object Segmentation Dichotomous Image Segmentation +3

124
0.39 stars / hour

LLoCO: Learning Long Contexts Offline

jeffreysijuntan/lloco 11 Apr 2024

We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA.

4k In-Context Learning +1

38
0.38 stars / hour

High-Fidelity Audio Compression with Improved RVQGAN

descriptinc/descript-audio-codec NeurIPS 2023

Language models have been successfully used to model natural signals, such as images, speech, and music.

Audio Compression Audio Generation +1

816
0.38 stars / hour

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

idea-research/groundingdino 9 Mar 2023

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Referring Expression Referring Expression Comprehension +2

4,904
0.38 stars / hour

VMamba: Visual State Space Model

mzeromiko/vmamba 18 Jan 2024

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have long been the predominant backbone networks for visual representation learning.

Computational Efficiency Representation Learning

1,335
0.37 stars / hour

What Makes Good In-Context Examples for GPT-$3$?

stanfordnlp/dsp 17 Jan 2021

Inspired by the recent success of leveraging a retrieval module to augment large-scale neural network models, we propose to retrieve examples that are semantically-similar to a test sample to formulate its corresponding prompt.

Few-Shot Learning Natural Language Understanding +4

9,984
0.37 stars / hour

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

InternLM/InternLM-XComposer 22 Feb 2024

We present DualFocus, a novel framework for integrating macro and micro perspectives within multi-modal large language models (MLLMs) to enhance vision-language task performance.

Hallucination

1,518
0.35 stars / hour

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

NVlabs/FoundationPose 13 Dec 2023

We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups.

3D Object Detection 3D Object Tracking +7

813
0.35 stars / hour

CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion

haha-lisa/CreativeSynth 25 Jan 2024

Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images.

Image Generation Style Transfer

41
0.34 stars / hour