OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

OS-Copilot/FRIDAY 12 Feb 2024

Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents.

Forgedit: Text Guided Image Editing via Learning and Forgetting

witcherofresearch/forgedit 19 Sep 2023

Text guided image editing on real images given only the image and the target text prompt as inputs, is a very general and challenging problem, which requires the editing model to reason by itself which part of the image should be edited, to preserve the characteristics of original image, and also to perform complicated non-rigid editing.


PALO: A Polyglot Large Multimodal Model for 5B People

mbzuai-oryx/palo 22 Feb 2024

\textsc{Palo} offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of $\sim$5B people (65\% of the world population).

Language Modelling Large Language Model +1

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

jishengpeng/languagecodec 19 Feb 2024

Furthermore, we also validate the efficiency of the Language-Codec on downstream speech language models.

Audio Generation Quantization

AgentScope: A Flexible yet Robust Multi-Agent Platform

modelscope/agentscope 21 Feb 2024

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications.

A Survey on Knowledge Distillation of Large Language Models

tebmer/awesome-knowledge-distillation-of-llms 20 Feb 2024

In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral.

Data Augmentation Knowledge Distillation +1

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

LiheYoung/Depth-Anything 19 Jan 2024

To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.

 Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Data Augmentation Monocular Depth Estimation +1

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

louisYen/Gen4Gen 23 Feb 2024

First, current personalization techniques fail to reliably extend to multiple concepts -- we hypothesize this to be due to the mismatch between complex scenes and simple text descriptions in the pre-training dataset (e. g., LAION).

Magic-Me: Identity-Specific Video Customized Diffusion

zhen-dong/magic-me 14 Feb 2024

In the field of text-to-image generation (T2I), subject-driven content generation has achieved great progress with the ID in the images controllable.

Video Generation

TorchCP: A Library for Conformal Prediction based on PyTorch

ml-stat-sustech/torchcp 20 Feb 2024

TorchCP is a Python toolbox for conformal prediction research on deep learning models.

Conformal Prediction regression

