Moving Object Segmentation: All You Need Is SAM (and Flow)

Jyxarthur/flowsam 18 Apr 2024

The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video.

Motion Segmentation Object +6

146
1.14 stars / hour

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

ailab-cvc/seed-x 22 Apr 2024

We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.

Image Generation

91
1.08 stars / hour

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

thudm/autowebglm 4 Apr 2024

Large language models (LLMs) have fueled many intelligent agent tasks, such as web navigation -- but most existing agents perform far from satisfying in real-world webpages due to three factors: (1) the versatility of actions on webpages, (2) HTML text exceeding model processing capacity, and (3) the complexity of decision-making due to the open-domain nature of web.

Decision Making Language Modelling +1

294
0.89 stars / hour

Magic Clothing: Controllable Garment-Driven Image Synthesis

shinechen1024/magicclothing 15 Apr 2024

We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.

Image Generation

919
0.82 stars / hour

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Image Generation Language Modelling +2

2,886
0.80 stars / hour

Llama 2: Open Foundation and Fine-Tuned Chat Models

flagalpha/llama2-chinese 18 Jul 2023

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.

Arithmetic Reasoning +5

10,295
0.78 stars / hour

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

stanford-oval/storm 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

3,895
0.78 stars / hour

AgentKit: Flow Engineering with Graphs, not Coding

holmeswww/agentkit 17 Apr 2024

The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".

198
0.74 stars / hour

Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization

princeton-vl/multislam_diffpose 23 Apr 2024

The backbone is trained end-to-end using a novel differentiable solver for wide-baseline two-view pose.

Optical Flow Estimation Visual Odometry

31
0.71 stars / hour

Towards Large Language Models as Copilots for Theorem Proving in Lean

lean-dojo/leancopilot 18 Apr 2024

In this paper, we explore LLMs as copilots that assist humans in proving theorems.

Automated Theorem Proving Hallucination

775
0.70 stars / hour