DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

deepseek-ai/DeepSeek-Coder 25 Jan 2024

The rapid development of large language models has revolutionized code intelligence in software development.

Code Generation Language Modeling +2

19,070
0.46 stars / hour

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

damo-nlp-sg/videollama3 22 Jan 2025

The key insight of our vision-centric training paradigm is that high-quality image-text data is crucial for both image and video understanding.

Philosophy Video Understanding

417
0.45 stars / hour

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

bytedance/ui-tars 21 Jan 2025

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e. g., keyboard and mouse operations).

2,273
0.44 stars / hour

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

byliutao/1prompt1story 23 Jan 2025

Drawing inspiration from the inherent context consistency, we propose a novel training-free method for consistent text-to-image (T2I) generation, termed "One-Prompt-One-Story" (1Prompt1Story).

Story Generation Text-to-Image Generation

146
0.44 stars / hour

TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis

saberatalukder/totem 26 Feb 2024

We conclude that TOTEM matches or outperforms existing state-of-the-art models in both the canonical specialist setting (i. e., training one model on one domain) as well as the generalist setting (i. e., training a single model on many domains), which demonstrates the efficacy of tokenization for general time series analysis.

Anomaly Detection Imputation +2

191
0.43 stars / hour

LLMs can see and hear without any training

facebookresearch/mils 30 Jan 2025

We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free approach, to imbue multimodal capabilities into your favorite LLM.

Audio captioning Style Transfer +1

159
0.39 stars / hour

UnCommon Objects in 3D

facebookresearch/uco3d 13 Jan 2025

We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset for 3D deep learning and 3D generative AI.

Object

667
0.37 stars / hour

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

assafelovic/gpt-researcher 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

17,983
0.37 stars / hour

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

hhhhhhao/continuous_tokenizer 14 Dec 2024

With its fully-differentiable design and semantic-rich latent space, our experiment demonstrates that SoftVQ-VAE achieves efficient tokenization without compromising generation quality, paving the way for more efficient generative models.

Denoising Image Generation

111
0.37 stars / hour

Making Images Real Again: A Comprehensive Survey on Deep Image Composition

bcmi/awesome-object-insertion 28 Jun 2021

Image composition task could be decomposed into multiple sub-tasks, in which each sub-task targets at one or more issues.

Image Harmonization Object +1

488
0.37 stars / hour