Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

nianticlabs/mickey 9 Apr 2024

Usually, correspondences are 2D-to-2D and the pose we estimate is defined only up to scale.

255
0.51 stars / hour

SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

letterligo/text-agnostic-governance 10 Apr 2024

The key idea is to eliminate unsafe visual representations from the model regardless of the text input.

72
0.50 stars / hour

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Leeroo-AI/mergoo 12 Mar 2024

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.

Arithmetic Reasoning Code Generation +6

155
0.50 stars / hour

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Beomi/InfiniTransformer 10 Apr 2024

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation.

Book summarization Language Modelling +1

59
0.50 stars / hour

ReFT: Representation Finetuning for Language Models

stanfordnlp/pyreft 4 Apr 2024

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

525
0.50 stars / hour

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

zhengpeng7/birefnet 7 Jan 2024

It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).

 Ranked #1 on RGB Salient Object Detection on HRSOD (using extra training data)

Camouflaged Object Segmentation Dichotomous Image Segmentation +3

134
0.48 stars / hour

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

sail-sg/clot 5 Dec 2023

To this end, we study LLMs on the popular Oogiri game which needs participants to have good creativity and strong associative thinking for responding unexpectedly and humorously to the given image, text, or both, and thus is suitable for LoT study.

Logical Reasoning

190
0.44 stars / hour

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

3,807
0.42 stars / hour

OmniFusion Technical Report

airi-institute/omnifusion 9 Apr 2024

We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality.

Visual Question Answering

164
0.41 stars / hour

CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion

haha-lisa/CreativeSynth 25 Jan 2024

Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images.

Image Generation Style Transfer

44
0.41 stars / hour