Search Results for author: Daniel Bolya

Found 14 papers, 11 papers with code

Perception Encoder: The best visual embeddings are not at the output of the network

1 code implementation17 Apr 2025 Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Rasheed, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Daniel Li, Piotr Dollár, Christoph Feichtenhofer

Together with the core contrastive checkpoint, our PE family of models achieves state-of-the-art performance on a wide variety of tasks, including zero-shot image and video classification and retrieval; document, image, and video Q&A; and spatial tasks such as detection, depth estimation, and tracking.

 Ranked #1 on Object Detection on COCO minival (using extra training data)

Depth Estimation Language Modeling +4

Window Attention is Bugged: How not to Interpolate Position Embeddings

no code implementations9 Nov 2023 Daniel Bolya, Chaitanya Ryali, Judy Hoffman, Christoph Feichtenhofer

To fix it, we introduce a simple absolute window position embedding strategy, which solves the bug outright in Hiera and allows us to increase both speed and performance of the model in ViTDet.

Position

ZipIt! Merging Models from Different Tasks without Training

1 code implementation4 May 2023 George Stoica, Daniel Bolya, Jakob Bjorner, Pratik Ramesh, Taylor Hearn, Judy Hoffman

While this works for models trained on the same task, we find that this fails to account for the differences in models trained on disjoint tasks.

Token Merging for Fast Stable Diffusion

4 code implementations30 Mar 2023 Daniel Bolya, Judy Hoffman

In the process, we speed up image generation by up to 2x and reduce memory consumption by up to 5. 6x.

Image Generation

Token Merging: Your ViT But Faster

5 code implementations17 Oct 2022 Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, Judy Hoffman

Off-the-shelf, ToMe can 2x the throughput of state-of-the-art ViT-L @ 512 and ViT-H @ 518 models on images and 2. 2x the throughput of ViT-L on video with only a 0. 2-0. 3% accuracy drop in each case.

Efficient ViTs

Hydra Attention: Efficient Attention with Many Heads

no code implementations15 Sep 2022 Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Judy Hoffman

While transformers have begun to dominate many tasks in vision, applying them to large images is still computationally difficult.

Scalable Diverse Model Selection for Accessible Transfer Learning

1 code implementation NeurIPS 2021 Daniel Bolya, Rohit Mittapalli, Judy Hoffman

In this paper, we formalize this setting as "Scalable Diverse Model Selection" and propose several benchmarks for evaluating on this task.

Diversity model +2

Likelihood Landscapes: A Unifying Principle Behind Many Adversarial Defenses

no code implementations25 Aug 2020 Fu Lin, Rohit Mittapalli, Prithvijit Chattopadhyay, Daniel Bolya, Judy Hoffman

Convolutional Neural Networks have been shown to be vulnerable to adversarial examples, which are known to locate in subspaces close to where normal data lies but are not naturally occurring and of low probability.

Adversarial Defense Adversarial Robustness

TIDE: A General Toolbox for Identifying Object Detection Errors

2 code implementations ECCV 2020 Daniel Bolya, Sean Foley, James Hays, Judy Hoffman

We introduce TIDE, a framework and associated toolbox for analyzing the sources of error in object detection and instance segmentation algorithms.

Instance Segmentation object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.