no code implementations • 26 Nov 2024 • Jiaxuan Li, Junwen Mo, MinhDuc Vo, Akihiro Sugimoto, Hideki Nakayama
Multimodal Large Language Models (MLLMs) have made notable advances in visual understanding, yet their abilities to recognize objects modified by specific attributes remain an open question.
no code implementations • CVPR 2024 • Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama
Instead of relying on large amounts of data and/or scaling up network parameters, we introduce a highly effective retrieval-augmented image captioning method that prompts LLMs with object names retrieved from External Visual--name memory (EVCap).
no code implementations • CVPR 2023 • Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto, Hideki Nakayama
Humans possess the capacity to reason about the future based on a sparse collection of visual cues acquired over time.
no code implementations • 12 Apr 2023 • Trung-Nghia Le, Tam V. Nguyen, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran, Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Vinh-Tiep Nguyen, Tuong-Nghiem Diep, Khanh-Duy Ho, Xuan-Hieu Nguyen, Thien-Phuc Tran, Tuan-Anh Yang, Kim-Phat Tran, Nhu-Vinh Hoang, Minh-Quang Nguyen, E-Ro Nguyen, Minh-Khoi Nguyen-Nhat, Tuan-An To, Trung-Truc Huynh-Le, Nham-Tan Nguyen, Hoang-Chau Luong, Truong Hoai Phong, Nhat-Quynh Le-Pham, Huu-Phuc Pham, Trong-Vu Hoang, Quang-Binh Nguyen, Hai-Dang Nguyen, Akihiro Sugimoto, Minh-Triet Tran
Unlike previous SHREC challenge tracks, the proposed task is considerably more challenging, requiring participants to develop innovative approaches to tackle the problem of text-based retrieval.
no code implementations • 12 Apr 2023 • Trung-Nghia Le, Tam V. Nguyen, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran, Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Vinh-Tiep Nguyen, Nhat-Quynh Le-Pham, Huu-Phuc Pham, Trong-Vu Hoang, Quang-Binh Nguyen, Trong-Hieu Nguyen-Mau, Tuan-Luc Huynh, Thanh-Danh Le, Ngoc-Linh Nguyen-Ha, Tuong-Vy Truong-Thuy, Truong Hoai Phong, Tuong-Nghiem Diep, Khanh-Duy Ho, Xuan-Hieu Nguyen, Thien-Phuc Tran, Tuan-Anh Yang, Kim-Phat Tran, Nhu-Vinh Hoang, Minh-Quang Nguyen, Hoai-Danh Vo, Minh-Hoa Doan, Hai-Dang Nguyen, Akihiro Sugimoto, Minh-Triet Tran
To this end, we introduce a novel SHREC challenge track that focuses on retrieving relevant 3D animal models from a dataset using sketch queries and expedites accessing 3D models through available sketches.
1 code implementation • IEEE Access 2022 • Masato Fujitake, Akihiro Sugimoto
In this paper, we enhance features element-wisely before the object candidate region detection, proposing Video Sparse Transformer with Attention-guided Memory (VSTAM).
Ranked #1 on Object Detection on UA-DETRAC
no code implementations • CVPR 2022 • Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama
We propose an end-to-end Novel Object Captioning with Retrieved vocabulary from External Knowledge method (NOC-REK), which simultaneously learns vocabulary retrieval and caption generation, successfully describing novel objects outside of the training dataset.
1 code implementation • 16 Mar 2022 • Khoa Vo, Kashu Yamazaki, Sang Truong, Minh-Triet Tran, Akihiro Sugimoto, Ngan Le
Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet plays an important role in many tasks of video analysis and understanding.
no code implementations • 16 Mar 2022 • Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama
We push forward neural network compression research by exploiting a novel challenging task of large-scale conditional generative adversarial networks (GANs) compression.
no code implementations • 17 Jul 2021 • Viet-Khoa Vo-Ho, Ngan Le, Kashu Yamazaki, Akihiro Sugimoto, Minh-Triet Tran
Temporal action proposal generation is an essential and challenging task that aims at localizing temporal intervals containing human actions in untrimmed videos.
2 code implementations • Computer Vision and Image Understanding 2019 • Trung-Nghia Le, Tam V. Nguyen, Zhongliang Nie, Minh-Triet Tran, Akihiro Sugimoto
Different from existing networks for segmentation, our proposed network possesses the second branch for classification to predict the probability of containing camouflaged object(s) in an image, which is then fused into the main branch for segmentation to boost up the segmentation accuracy.
no code implementations • ECCV 2020 • Zuzana Kukelova, Cenek Albl, Akihiro Sugimoto, Konrad Schindler, Tomas Pajdla
The internal geometry of most modern consumer cameras is not adequately described by the perspective projection.
1 code implementation • CVPR 2020 • Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi
In this paper, we propose the tetrahedral outer shell volumetric truncated signed distance function (TetraTSDF) model for the human body, and its corresponding part connection network (PCN) for 3D human body shape regression.
no code implementations • 19 Nov 2019 • Duc Minh Vo, Akihiro Sugimoto
The semantic content feature and the style representation feature are then concatenated adaptively and fed into the decoder to generate style-transferred (stylized) images.
no code implementations • ECCV 2020 • Duc Minh Vo, Akihiro Sugimoto
We also use individual relation separately to predict from the initial bounding-boxes relation-units for all the relations in the input text.
no code implementations • 30 Dec 2018 • Zuzana Kukelova, Cenek Albl, Akihiro Sugimoto, Tomas Pajdla
Our best 6-point solver, based on the new alternation technique, shows an identical or even better performance than the state-of-the-art R6P solver and is two orders of magnitude faster.
no code implementations • 4 Jul 2018 • Trung-Nghia Le, Akihiro Sugimoto
In addition, to tackle the task of VSSIS, we augment the DAVIS-2017 benchmark dataset by assigning semantic ground-truth for salient instance labels, obtaining SEmantic Salient Instance Video (SESIV) dataset.
1 code implementation • SPIE Journal of Electronic Imaging 2017 • Jiří Borovec, Jan Kybic, Akihiro Sugimoto
Region growing is a classical image segmentation method based on hierarchical region aggregation using local similarity rules.
no code implementations • 4 Aug 2017 • Trung-Nghia Le, Akihiro Sugimoto
STCRF is our extension of CRF to the temporal domain and describes the relationships among neighboring regions both in a frame and over frames.
no code implementations • 4 Aug 2017 • Trung-Nghia Le, Akihiro Sugimoto
Detecting salient objects from a video requires exploiting both spatial and temporal knowledge included in the video.