Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

xuezhemax/megalodon 12 Apr 2024

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

475
0.45 stars / hour

L-MAGIC: Language Model Assisted Generation of Images with Coherence

intellabs/mmpano CVPR 2024

However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e. g., multiple beds in a bedroom) or requires time-consuming human text inputs for each view.

Depth Estimation Language Modelling +2

83
0.44 stars / hour

NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

caiyuanhao1998/retinexformer 22 Apr 2024

This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results.

4k Low-Light Image Enhancement +1

591
0.43 stars / hour

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

oliverrensu/mvg 8 Jun 2024

This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework.

Conditional Image Generation Denoising +2

25
0.40 stars / hour

FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models

ai4finance-foundation/finrobot 23 May 2024

As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community.

AI Agent Decision Making +2

968
0.40 stars / hour

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

emo-box/emobox 11 Jun 2024

In this paper, we propose EmoBox, an out-of-the-box multilingual multi-corpus speech emotion recognition toolkit, along with a benchmark for both intra-corpus and cross-corpus settings.

Cross-corpus Speech Emotion Recognition

27
0.40 stars / hour

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

sjtu-ipads/powerinfer 10 Jun 2024

This paper introduces PowerInfer-2, a framework designed for high-speed inference of Large Language Models (LLMs) on smartphones, particularly effective for models whose sizes exceed the device's memory capacity.

Language Modelling Large Language Model

7,369
0.39 stars / hour

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

tencent/hunyuandit 14 May 2024

For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images.

Image Generation Language Modelling +2

2,288
0.38 stars / hour

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

gojasper/flash-diffusion 4 Jun 2024

In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion.

Face Swapping Image Inpainting +1

175
0.38 stars / hour

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

opengvlab/gui-odyssey 12 Jun 2024

Smartphone users often navigate across multiple applications (apps) to complete tasks such as sharing content between social media platforms.

Navigate

10
0.38 stars / hour