Language-conditioned Detection Transformer

janghyuncho/decola 29 Nov 2023

We use this detector to pseudo-label images with image-level labels.

Pseudo Label

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

alibaba-damo-academy/FunASR 7 Aug 2023

It possesses the advantages of AED-based model's accuracy, NAR model's efficiency, and explicit customization capacity of superior performance.

White-Box Transformers via Sparse Rate Reduction

ma-lab-berkeley/crate NeurIPS 2023

Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens.

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

cswry/seesr 27 Nov 2023

First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation.

Image Super-Resolution

AlignBench: Benchmarking Chinese Alignment of Large Language Models

thudm/alignbench 30 Nov 2023

Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants.


Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

rohitgandikota/sliders 20 Nov 2023

We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.

Image Generation

SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation

jiehonglin/sam-6d 27 Nov 2023

Zero-shot 6D object pose estimation involves the detection of novel objects with their 6D poses in cluttered scenes, presenting significant challenges for model generalizability.

6D Pose Estimation using RGB Instance Segmentation +3

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

yl4579/StyleTTS2 31 May 2023

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task.

RETVec: Resilient and Efficient Text Vectorizer

google-research/retvec NeurIPS 2023

The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial attacks.

Adversarial Text Metric Learning +1

GLM-130B: An Open Bilingual Pre-trained Model

thudm/chatglm 5 Oct 2022

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.

Language Modelling Multi-task Language Understanding +1

