DAMO-YOLO : A Report on Real-Time Object Detection Design

tinyvision/damo-yolo 23 Nov 2022

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.

Neural Architecture Search object-detection +1

Dual PatchNorm

lucidrains/musiclm-pytorch 2 Feb 2023

We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers.

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

AttendAndExcite/Attend-and-Excite 31 Jan 2023

Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt.

Generative Semantic Nursing

Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data

salesforce/causalai 25 Jan 2023

We introduce the Salesforce CausalAI Library, an open-source library for causal analysis using observational data.

Causal Discovery Causal Inference +1

OpenSpike: An OpenRAM SNN Accelerator

sfmth/openspike 2 Feb 2023

This paper presents a spiking neural network (SNN) accelerator made using fully open-source EDA tools, process design kit (PDK), and memory macros synthesized using OpenRAM.

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

sense-gvt/fast-bev 29 Jan 2023

Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference.

Data Augmentation

Cut and Learn for Unsupervised Object Detection and Instance Segmentation

facebookresearch/cutler 26 Jan 2023

We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models.

Instance Segmentation object-detection +2

Epic-Sounds: A Large-scale Dataset of Actions That Sound

epic-kitchens/epic-sounds-annotations 1 Feb 2023

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos.

Action Recognition

LAION-5B: An open large-scale dataset for training next generation image-text models

mlfoundations/open_clip NeurIPS 2022 Datasets and Benchmarks 2022

We show successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discuss further experiments enabled with an openly available dataset of this scale.

Image Generation Zero-Shot Learning

ArchiSound: Audio Generation with Diffusion

archinetai/audio-diffusion-pytorch 30 Jan 2023

The recent surge in popularity of diffusion models for image generation has brought new attention to the potential of these models in other areas of media generation.

Audio Generation Image Generation

