no code implementations • 13 Jun 2024 • Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek, Xinlei Chen
This work does not introduce a new method.
no code implementations • 9 Oct 2023 • Duy-Kien Nguyen, Martin R. Oswald, Cees G. M. Snoek
The ability to detect objects in images at varying scales has played a pivotal role in the design of modern object detectors.
1 code implementation • 8 Jun 2023 • Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen
In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning.
1 code implementation • CVPR 2022 • Duy-Kien Nguyen, Jihong Ju, Olaf Booij, Martin R. Oswald, Cees G. M. Snoek
Specifically, we present BoxeR, short for Box Transformer, which attends to a set of boxes by predicting their transformation from a reference window on an input feature map.
1 code implementation • ICLR 2021 • Duy-Kien Nguyen, Vedanuj Goswami, Xinlei Chen
This paper focuses on visual counting, which aims to predict the number of occurrences given a natural image and a query (e. g. a question or a category).
Ranked #1 on Object Counting on HowMany-QA
no code implementations • 20 Jan 2020 • Tsun-Yi Yang, Duy-Kien Nguyen, Huub Heijnen, Vassileios Balntas
In this paper, we explore how three related tasks, namely keypoint detection, description, and image retrieval can be jointly tackled using a single unified framework, which is trained without the need of training data with point to point correspondences.
no code implementations • CVPR 2019 • Duy-Kien Nguyen, Takayuki Okatani
The representation is hierarchical, and prediction for each task is computed from the representation at its corresponding level of the hierarchy.
1 code implementation • CVPR 2018 • Duy-Kien Nguyen, Takayuki Okatani
A key solution to visual question answering (VQA) exists in how to fuse visual and language features extracted from an input image and question.