Search Results for author: Thao Minh Le

Found 12 papers, 4 papers with code

Deep Neural Networks for Visual Reasoning

no code implementations24 Sep 2022 Thao Minh Le

Visual perception and language understanding are - fundamental components of human intelligence, enabling them to understand and reason about objects and their interactions.

Multimodal Reasoning Visual Reasoning

Video Dialog as Conversation about Objects Living in Space-Time

1 code implementation8 Jul 2022 Hoang-Anh Pham, Thao Minh Le, Vuong Le, Tu Minh Phuong, Truyen Tran

To tackle these challenges we present a new object-centric framework for video dialog that supports neural reasoning dubbed COST - which stands for Conversation about Objects in Space-Time.

Object Relational Reasoning +3

Guiding Visual Question Answering with Attention Priors

no code implementations25 May 2022 Thao Minh Le, Vuong Le, Sunil Gupta, Svetha Venkatesh, Truyen Tran

This grounding guides the attention mechanism inside VQA models through a duality of mechanisms: pre-training attention weight calculation and directly guiding the weights at inference time on a case-by-case basis.

Question Answering Visual Grounding +2

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

no code implementations25 Jun 2021 Long Hoang Dang, Thao Minh Le, Vuong Le, Truyen Tran

Toward reaching this goal we propose an object-oriented reasoning approach in that video is abstracted as a dynamic stream of interacting objects.

Object Question Answering +1

Object-Centric Representation Learning for Video Question Answering

no code implementations12 Apr 2021 Long Hoang Dang, Thao Minh Le, Vuong Le, Truyen Tran

Video question answering (Video QA) presents a powerful testbed for human-like intelligent behaviors.

Object Question Answering +3

GEFA: Early Fusion Approach in Drug-Target Affinity Prediction

1 code implementation25 Sep 2020 Tri Minh Nguyen, Thin Nguyen, Thao Minh Le, Truyen Tran

In addition, previous DTA methods learn protein representation solely based on a small number of protein sequences in DTA datasets while neglecting the use of proteins outside of the DTA datasets.

Dynamic Language Binding in Relational Visual Reasoning

1 code implementation30 Apr 2020 Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

We present Language-binding Object Graph Network, the first neural reasoning method with dynamic relational structures across both visual and textual domains with applications in visual question answering.

Object Question Answering +2

Hierarchical Conditional Relation Networks for Video Question Answering

1 code implementation CVPR 2020 Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts.

Audio-Visual Question Answering (AVQA) Question Answering +4

Neural Reasoning, Fast and Slow, for Video Question Answering

no code implementations10 Jul 2019 Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

While recent advances in lingual and visual question answering have enabled sophisticated representations and neural reasoning mechanisms, major challenges in Video QA remain on dynamic grounding of concepts, relations and actions to support the reasoning process.

Natural Questions Question Answering +2

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

no code implementations12 Sep 2018 Thao Minh Le, Nobuyuki Shimizu, Takashi Miyazaki, Koichi Shinoda

With the widespread use of intelligent systems, such as smart speakers, addressee recognition has become a concern in human-computer interaction, as more and more people expect such systems to understand complicated social scenes, including those outdoors, in cafeterias, and hospitals.

Cannot find the paper you are looking for? You can Submit a new open access paper.