Segment Anything in High Quality

syscv/sam-hq 2 Jun 2023

HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs.

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

salesforce/codetf 31 May 2023

In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

billxbf/rewoo 23 May 2023

Augmented Language Models (ALMs) blend the reasoning capabilities of Large Language Models (LLMs) with tools that allow for knowledge retrieval and action execution.


XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

vinairesearch/xphonebert 31 May 2023

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task.

Speech Synthesis

Humans in 4D: Reconstructing and Tracking Humans with Transformers

shubham-goel/4D-Humans 31 May 2023

To analyze video, we use 3D reconstructions from HMR 2. 0 as input to a tracking system that operates in 3D.

Action Recognition Human Mesh Recovery +1

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

facebookresearch/hiera 1 Jun 2023

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

 Ranked #1 on Action Recognition on AVA v2.2 (using extra training data)

Action Classification Action Recognition In Videos +4

Scene as Occupancy

opendrivelab/occnet 5 Jun 2023

Human driver can easily describe the complex traffic scene by visual system.

Motion Planning

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

mit-han-lab/llm-awq 1 Jun 2023

Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth).

Common Sense Reasoning Language Modelling +1

Gorilla: Large Language Model Connected with Massive APIs

ShishirPatil/gorilla 24 May 2023

Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis.

Language Modelling Mathematical Reasoning +2

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

kyegomez/tree-of-thoughts 17 May 2023

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference.

Decision Making Language Modelling

