Trending Research

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

xuezhemax/megalodon • • 12 Apr 2024

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

475

0.45 stars / hour

Paper
Code

L-MAGIC: Language Model Assisted Generation of Images with Coherence

intellabs/mmpano • • CVPR 2024

However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e. g., multiple beds in a bedroom) or requires time-consuming human text inputs for each view.

Depth Estimation Language Modelling +2

0.44 stars / hour

Paper
Code

NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

caiyuanhao1998/retinexformer • • 22 Apr 2024

This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results.

4k Low-Light Image Enhancement +1

591

0.43 stars / hour

Paper
Code

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

oliverrensu/mvg • • 8 Jun 2024

This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework.

Conditional Image Generation Denoising +2

0.40 stars / hour

Paper
Code

FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models

ai4finance-foundation/finrobot • 23 May 2024

As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community.

AI Agent Decision Making +2

968

0.40 stars / hour

Paper
Code

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

emo-box/emobox • • 11 Jun 2024

In this paper, we propose EmoBox, an out-of-the-box multilingual multi-corpus speech emotion recognition toolkit, along with a benchmark for both intra-corpus and cross-corpus settings.

Cross-corpus Speech Emotion Recognition

0.40 stars / hour

Paper
Code

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

sjtu-ipads/powerinfer • 10 Jun 2024

This paper introduces PowerInfer-2, a framework designed for high-speed inference of Large Language Models (LLMs) on smartphones, particularly effective for models whose sizes exceed the device's memory capacity.

Language Modelling Large Language Model

7,369

0.39 stars / hour

Paper
Code

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

tencent/hunyuandit • • 14 May 2024

For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images.

Image Generation Language Modelling +2

2,288

0.38 stars / hour

Paper
Code

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

gojasper/flash-diffusion • • 4 Jun 2024

In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion.

Face Swapping Image Inpainting +1

175

0.38 stars / hour

Paper
Code

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

opengvlab/gui-odyssey • 12 Jun 2024

Smartphone users often navigate across multiple applications (apps) to complete tasks such as sharing content between social media platforms.

Navigate

0.38 stars / hour

Paper
Code