Search Results for author: James Gabriel

Found 4 papers, 4 papers with code

FastVLM: Efficient Vision Encoding for Vision Language Models

1 code implementation17 Dec 2024 Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari

At different operational resolutions, the vision encoder of a VLM can be optimized along two axes: reducing encoding latency and minimizing the number of visual tokens passed to the LLM, thereby lowering overall latency.

HUGS: Human Gaussian Splats

1 code implementation CVPR 2024 Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan

We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work.

3DGS Neural Rendering +1

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

6 code implementations ICCV 2023 Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan

To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network.

Ranked #2 on Semantic Segmentation on ADE20K (Mean IoU (class) metric)

3D Hand Pose Estimation Image Classification +1

MobileOne: An Improved One millisecond Mobile Backbone

10 code implementations CVPR 2023 Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan

Furthermore, we show that our model generalizes to multiple tasks - image classification, object detection, and semantic segmentation with significant improvements in latency and accuracy as compared to existing efficient architectures when deployed on a mobile device.

Ranked #3 on Image Classification on ImageNet (Number of params metric)

Efficient Neural Network Gaze Estimation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.