IMAGDressing-v1: Customizable Virtual Dressing

muzishen/imagdressing 17 Jul 2024

Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience.

Denoising Image Generation +1

687
1.95 stars / hour

DataComp-LM: In search of the next generation of training sets for language models

mlfoundations/dclm 17 Jun 2024

We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models.

Language Modelling MMLU +1

740
1.90 stars / hour

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

stanford-oval/storm 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

8,145
1.33 stars / hour

E5-V: Universal Embeddings with Multimodal Large Language Models

kongds/e5-v 17 Jul 2024

We propose a single modality training approach for E5-V, where the model is trained exclusively on text pairs.

119
1.03 stars / hour

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

funaudiollm/cosyvoice 4 Jul 2024

This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs).

Emotion Recognition Event Detection +6

2,690
0.83 stars / hour

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

KwaiVGI/LivePortrait 3 Jul 2024

Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability.

Computational Efficiency Face Reenactment +3

8,287
0.75 stars / hour

Qwen2-Audio Technical Report

qwenlm/qwen2-audio 15 Jul 2024

We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.

Instruction Following Language Modelling

464
0.67 stars / hour

VGGSfM: Visual Geometry Grounded Deep Structure From Motion

facebookresearch/vggsfm CVPR 2024

Finally we optimise the cameras and triangulate 3D points via a differentiable bundle adjustment layer.

Camera Calibration Point Tracking

666
0.67 stars / hour

Cradle: Empowering Foundation Agents Towards General Computer Control

baai-agents/cradle 5 Mar 2024

To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i. e., using screenshots as input and keyboard and mouse actions as output.

Efficient Exploration

1,325
0.67 stars / hour

Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions

flairnlp/fundus 22 Mar 2024

This paper introduces Fundus, a user-friendly news scraper that enables users to obtain millions of high-quality news articles with just a few lines of code.

254
0.62 stars / hour