The Platonic Representation Hypothesis

minyoungg/platonic-rep 13 May 2024

We argue that representations in AI models, particularly deep networks, are converging.

MarkLLM: An Open-Source Toolkit for LLM Watermarking

thu-bpm/markllm 16 May 2024

However, the abundance of LLM watermarking algorithms, their intricate mechanisms, and the complex evaluation procedures and perspectives pose challenges for researchers and the community to easily experiment with, understand, and assess the latest advancements.

MambaOut: Do We Really Need Mamba for Vision?

yuweihao/mambaout 13 May 2024

For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks.

Image Classification Instance Segmentation +2

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.


LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion

robfiras/loco-mujoco 4 Nov 2023

Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied agents.

Benchmarking Imitation Learning

VILA: On Pre-training for Visual Language Models

efficient-large-model/vila 12 Dec 2023

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

In-Context Learning Language Modelling +2

From Sora What We Can See: A Survey of Text-to-Video Generation

soraw-ai/awesome-text-to-video-generation 17 May 2024

With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence.

Text-to-Video Generation Video Generation

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

kongds/mora 20 May 2024

Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models.

Continual Pretraining Mathematical Reasoning

EasySpider: A No-Code Visual System for Crawling the Web

NaiboWang/EasySpider ACM The Web Conference 2023

As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research.

Data Integration Marketing

WavCraft: Audio Editing and Generation with Large Language Models

jinhualiang/wavcraft 14 Mar 2024

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

