Segment Anything in High Quality

syscv/sam-hq 2 Jun 2023

HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs.

2D Semantic Segmentation Semantic Segmentation

941
5.67 stars / hour

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

salesforce/codetf 31 May 2023

In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.

1,025
3.75 stars / hour

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

billxbf/rewoo 23 May 2023

Augmented Language Models (ALMs) blend the reasoning capabilities of Large Language Models (LLMs) with tools that allow for knowledge retrieval and action execution.

Retrieval

482
3.01 stars / hour

DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement

rikorose/deepfilternet 14 May 2023

Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal.

Speech Enhancement

1,034
2.29 stars / hour

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

vahe1994/spqr 5 Jun 2023

Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities.

Language Modelling Quantization

145
2.26 stars / hour

Humans in 4D: Reconstructing and Tracking Humans with Transformers

shubham-goel/4D-Humans 31 May 2023

To analyze video, we use 3D reconstructions from HMR 2. 0 as input to a tracking system that operates in 3D.

Action Recognition Human Mesh Recovery +1

403
1.68 stars / hour

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

facebookresearch/hiera 1 Jun 2023

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

 Ranked #1 on Action Recognition on AVA v2.2 (using extra training data)

Action Classification Action Recognition In Videos +4

329
1.56 stars / hour

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

vinairesearch/xphonebert 31 May 2023

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task.

130
1.51 stars / hour

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

damo-nlp-sg/video-llama 5 Jun 2023

For the second challenge, we leverage ImageBind, a universal embedding model aligning multiple modalities as the pre-trained audio encoder, and introduce an Audio Q-former on top of ImageBind to learn reasonable auditory query embeddings for the LLM module.

Language Modelling Text Generation +1

352
1.40 stars / hour

Gorilla: Large Language Model Connected with Massive APIs

ShishirPatil/gorilla 24 May 2023

Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis.

Language Modelling Mathematical Reasoning +2

3,521
1.38 stars / hour