Trending Research

Moving Object Segmentation: All You Need Is SAM (and Flow)

Jyxarthur/flowsam • • 18 Apr 2024

The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video.

Motion Segmentation Object +6

157

1.14 stars / hour

Paper
Code

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

ailab-cvc/seed-x • 22 Apr 2024

We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.

Image Generation

109

1.08 stars / hour

Paper
Code

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

thudm/autowebglm • 4 Apr 2024

Large language models (LLMs) have fueled many intelligent agent tasks, such as web navigation -- but most existing agents perform far from satisfying in real-world webpages due to three factors: (1) the versatility of actions on webpages, (2) HTML text exceeding model processing capacity, and (3) the complexity of decision-making due to the open-domain nature of web.

Decision Making Language Modelling +1

304

0.89 stars / hour

Paper
Code

Magic Clothing: Controllable Garment-Driven Image Synthesis

shinechen1024/magicclothing • • 15 Apr 2024

We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.

Image Generation

937

0.82 stars / hour

Paper
Code

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR • • 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Ranked #7 on Image Generation on ImageNet 256x256

Image Generation Language Modelling +2

2,930

0.80 stars / hour

Paper
Code

Llama 2: Open Foundation and Fine-Tuned Chat Models

flagalpha/llama2-chinese • • 18 Jul 2023

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.

Ranked #2 on Question Answering on PubChemQA

Arithmetic Reasoning +5

10,456

0.78 stars / hour

Paper
Code

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

stanford-oval/storm • 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

3,947

0.78 stars / hour

Paper
Code