Extended Agriculture-Vision: An Extension of a Large Aerial Image Dataset for Agricultural Pattern Analysis

jingwu6/extended-agriculture-vision-dataset 4 Mar 2023

First, we generate and release an improved version of the Agriculture-Vision dataset (Chiu et al., 2020b) to include raw, full-field imagery for greater experimental flexibility.

Benchmarking Contrastive Learning +2

160
0.65 stars / hour

AgentSquare: Automatic LLM Agent Search in Modular Design Space

tsinghua-fib-lab/agentsquare 8 Oct 2024

We believe that the modular design space and AgentSquare search framework offer a platform for fully exploiting the potential of prior successful designs and consolidating the collective efforts of research community.

110
0.59 stars / hour

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

homebrewltd/ichigo 20 Oct 2024

Large Language Models (LLMs) have revolutionized natural language processing, but their application to speech-based tasks remains challenging due to the complexities of integrating audio and text modalities.

Question Answering speech-recognition +1

1,745
0.58 stars / hour

Adaptive Length Image Tokenization via Recurrent Allocation

shivamduggal4/adaptive-length-tokenizer 4 Nov 2024

Our encoder-decoder architecture recursively processes 2D image tokens, distilling them into 1D latent tokens over multiple iterations of recurrent rollouts.

Decoder

66
0.56 stars / hour

A Scalable Communication Protocol for Networks of Large Language Models

agora-protocol/paper-demo 14 Oct 2024

These requisites, which we refer to as the Agent Communication Trilemma, are hard to achieve in large networks of agents.

84
0.54 stars / hour

In-Context LoRA for Diffusion Transformers

ali-vilab/In-Context-LoRA 31 Oct 2024

While task-specific in terms of tuning data, our framework remains task-agnostic in architecture and pipeline, offering a powerful tool for the community and providing valuable insights for further research on product-level task-agnostic generation systems.

Image Generation

366
0.53 stars / hour

No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

cvg/NoPoSplat 31 Oct 2024

We utilize the reconstructed 3D Gaussians for novel view synthesis and pose estimation tasks and propose a two-stage coarse-to-fine pipeline for accurate pose estimation.

3D Reconstruction Generalizable Novel View Synthesis +2

439
0.49 stars / hour

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

thudm/cogvideo 12 Aug 2024

We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.

Text-to-Video Generation Video Alignment +2

8,750
0.46 stars / hour

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

alibaba/Tora 31 Jul 2024

The TE encodes arbitrary trajectories into hierarchical spacetime motion patches with a 3D video compression network.

Video Compression Video Generation

602
0.46 stars / hour

Moonshine: Speech Recognition for Live Transcription and Voice Commands

usefulsensors/moonshine 21 Oct 2024

This paper introduces Moonshine, a family of speech recognition models optimized for live transcription and voice command processing.

Decoder Position +2

2,104
0.45 stars / hour