Segment Anything in High Quality

syscv/sam-hq 2 Jun 2023

HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs.

2D Semantic Segmentation Semantic Segmentation

DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement

rikorose/deepfilternet 14 May 2023

Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal.

Speech Enhancement

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

salesforce/codetf 31 May 2023

In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

billxbf/rewoo 23 May 2023

Augmented Language Models (ALMs) blend the reasoning capabilities of Large Language Models (LLMs) with tools that allow for knowledge retrieval and action execution.


Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

damo-nlp-sg/video-llama 5 Jun 2023

For the second challenge, we leverage ImageBind, a universal embedding model aligning multiple modalities as the pre-trained audio encoder, and introduce an Audio Q-former on top of ImageBind to learn reasonable auditory query embeddings for the LLM module.

Language Modelling Text Generation +1

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

vahe1994/spqr 5 Jun 2023

Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities.

Language Modelling Quantization

Fine-Tuning Language Models with Just Forward Passes

princeton-nlp/mezo 27 May 2023

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.


Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

x-plug/youku-mplug 7 Jun 2023

In addition, to facilitate a comprehensive evaluation of video-language models, we carefully build the largest human-annotated Chinese benchmarks covering three popular video-language tasks of cross-modal retrieval, video captioning, and video category classification.

Cross-Modal Retrieval Language Modelling +2

A Literature Study of Embeddings on Source Code

boyter/cs 5 Apr 2019

In this survey, we aim to collect and discuss the usage of word embedding techniques on programs and source code.

Matting Anything

shi-labs/matting-anything 8 Jun 2023

In this paper, we propose the Matting Anything Model (MAM), an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance.

Image Matting Referring Image Matting

