Search Results for author: Howard Zhou

Found 8 papers, 3 papers with code

HAMMR: HierArchical MultiModal React agents for generic VQA

no code implementations8 Apr 2024 Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings

We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.

Optical Character Recognition (OCR) Question Answering +1

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

no code implementations5 Mar 2024 Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig

Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points.

Image Classification Question Answering +2

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

1 code implementation ICCV 2023 Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.

Question Answering Retrieval +1

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

no code implementations22 Mar 2023 Arjun Karpur, Guilherme Perrotta, Ricardo Martin-Brualla, Howard Zhou, André Araujo

Finding localized correspondences across different images of the same object is crucial to understand its geometry.


IBRNet: Learning Multi-View Image-Based Rendering

1 code implementation CVPR 2021 Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, Thomas Funkhouser

Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes.

Neural Rendering Novel View Synthesis

Blockout: Dynamic Model Selection for Hierarchical Deep Networks

no code implementations CVPR 2016 Calvin Murdock, Zhen Li, Howard Zhou, Tom Duerig

Most deep architectures for image classification--even those that are trained to classify a large number of diverse categories--learn shared image representations with a single model.

Clustering General Classification +2

The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition

1 code implementation20 Nov 2015 Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei

Current approaches for fine-grained recognition do the following: First, recruit experts to annotate a dataset of images, optionally also collecting more structured data in the form of part annotations and bounding boxes.

Ranked #5 on Fine-Grained Image Classification on CUB-200-2011 (using extra training data)

Active Learning Fine-Grained Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.