Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

FoundationVision/Groma 19 Apr 2024

We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability.

Language Modelling Large Language Model +2

294
0.58 stars / hour

OpenVoice: Versatile Instant Voice Cloning

myshell-ai/openvoice 3 Dec 2023

The voice styles are not directly copied from and constrained by the style of the reference speaker.

Voice Cloning

23,533
0.53 stars / hour

AgentScope: A Flexible yet Robust Multi-Agent Platform

modelscope/agentscope 21 Feb 2024

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications.

1,156
0.53 stars / hour

QLoRA: Efficient Finetuning of Quantized LLMs

internlm/xtuner NeurIPS 2023

Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99. 3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.

Chatbot Instruction Following +2

2,395
0.52 stars / hour

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

snap-stanford/stark 19 Apr 2024

Answering real-world user queries, such as product search, often requires accurate retrieval of information from semi-structured knowledge bases or databases that involve blend of unstructured (e. g., textual descriptions of products) and structured (e. g., entity relations of products) information.

Benchmarking Retrieval

178
0.50 stars / hour

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

soccernet/sn-gamestate 17 Apr 2024

This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i. e. a minimap).

Camera Calibration Game State Reconstruction

114
0.50 stars / hour

MolTC: Towards Molecular Relational Modeling In Language Models

MangoKiller/MolTC 6 Feb 2024

Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research.

Relational Reasoning

146
0.46 stars / hour

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

4,346
0.46 stars / hour

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

McGill-NLP/webllama 8 Feb 2024

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion.

Conversational Web Navigation Text Generation +1

1,046
0.45 stars / hour

Make Your LLM Fully Utilize the Context

microsoft/FILM 25 Apr 2024

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge.

4k Information Retrieval +1

183
0.44 stars / hour