Adaptive Computation with Elastic Input Sequence

google-research/scenic 30 Jan 2023

However, most standard neural networks have the same function type and fixed computation budget on different samples regardless of their nature and difficulty.

Inductive Bias

Token Turing Machines

google-research/scenic CVPR 2023

The model's memory module ensures that a new observation will only be processed with the contents of the memory (and not the entire history), meaning that it can efficiently process long sequences with a bounded computational cost at each step.

Action Detection Activity Detection

Multiview Transformers for Video Recognition

google-research/scenic CVPR 2022

Video understanding requires reasoning at multiple spatiotemporal resolutions -- from short fine-grained motions to events taking place over longer durations.

Ranked #2 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Action Classification Action Recognition +1

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

google-research/scenic 21 Jun 2021

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

Action Classification Image Classification +3

ViViT: A Video Vision Transformer

google-research/scenic ICCV 2021

We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification.

Ranked #8 on Action Classification on Moments in Time (Top 5 Accuracy metric, using extra training data)

Action Classification Action Recognition +4

Learning with Neighbor Consistency for Noisy Labels

google-research/scenic CVPR 2022

Recent advances in deep learning have relied on large, labelled datasets to train high-capacity models.

Learning with noisy labels

SCENIC: A JAX Library for Computer Vision Research and Beyond

google-research/scenic CVPR 2022

Scenic is an open-source JAX library with a focus on Transformer-based models for computer vision research and beyond.

