no code implementations • 4 Jul 2023 • Jia-Hong Huang, Luka Murn, Marta Mrak, Marcel Worring
Existing datasets for manually labelled query-based video summarization are costly and thus small, limiting the performance of supervised deep video summarization models.
no code implementations • 4 Jul 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel Worring
Multi-modal video summarization has a video input and a text-based query input.
no code implementations • 30 Apr 2023 • Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring
In this work, a Causal Explainer, dubbed Causalainer, is proposed to address this issue.
no code implementations • 6 Apr 2023 • Jia-Hong Huang, Modar Alfadly, Bernard Ghanem, Marcel Worring
This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
2 code implementations • 13 Oct 2021 • Riccardo Di Sipio, Jia-Hong Huang, Samuel Yen-Chi Chen, Stefano Mangini, Marcel Worring
In this paper, we discuss the initial attempts at boosting understanding human language based on deep-learning models with quantum computing.
no code implementations • 30 May 2021 • Jia-Hong Huang, Ting-Wei Wu, Chao-Han Huck Yang, Marcel Worring
Automatically generating medical reports for retinal images is one of the promising ways to help ophthalmologists reduce their workload and improve work efficiency.
1 code implementation • 26 Apr 2021 • Jia-Hong Huang, Luka Murn, Marta Mrak, Marcel Worring
Traditional video summarization methods generate fixed video representations regardless of user interest.
no code implementations • 26 Apr 2021 • Jia-Hong Huang, Ting-Wei Wu, Marcel Worring
A traditional medical image captioning model creates a medical description only based on a single medical image input.
1 code implementation • 1 Nov 2020 • Jia-Hong Huang, Chao-Han Huck Yang, Fangyu Liu, Meng Tian, Yi-Chieh Liu, Ting-Wei Wu, I-Hung Lin, Kang Wang, Hiromasa Morikawa, Hernghua Chang, Jesper Tegner, Marcel Worring
To train and validate the effectiveness of our DNN-based module, we propose a large-scale retinal disease image dataset.
1 code implementation • 7 Apr 2020 • Jia-Hong Huang, Marcel Worring
In this work, we introduce a method which takes a text-based query as input and generates a video summary corresponding to it.
no code implementations • 30 Nov 2019 • Jia-Hong Huang, Modar Alfadly, Bernard Ghanem, Marcel Worring
In this work, we propose a new method that uses semantically related questions, dubbed basic questions, acting as noise to evaluate the robustness of VQA models.
1 code implementation • 11 Feb 2019 • Yi-Chieh Liu, Hao-Hsiang Yang, Chao-Han Huck Yang, Jia-Hong Huang, Meng Tian, Hiromasa Morikawa, Yi-Chang James Tsai, Jesper Tegner
Age-Related Macular Degeneration (AMD) is an asymptomatic retinal disease which may result in loss of vision.
1 code implementation • 16 Aug 2018 • C. -H. Huck Yang, Fangyu Liu, Jia-Hong Huang, Meng Tian, Hiromasa Morikawa, I-Hung Lin, Yi-Chieh Liu, Hao-Hsiang Yang, Jesper Tegner
Automatic clinical diagnosis of retinal diseases has emerged as a promising approach to facilitate discovery in areas with limited access to specialists.
1 code implementation • 17 Jun 2018 • C. -H. Huck Yang, Jia-Hong Huang, Fangyu Liu, Fang-Yi Chiu, Mengya Gao, Weifeng Lyu, I-Hung Lin M. D., Jesper Tegner
Automatic clinical diagnosis of retinal diseases has emerged as a promising approach to facilitate discovery in areas with limited access to specialists.
no code implementations • 16 Nov 2017 • Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, Bernard Ghanem
In VQA, adversarial attacks can target the image and/or the proposed main question and yet there is a lack of proper analysis of the later.
no code implementations • 14 Sep 2017 • Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, C. Huck Yang, Bernard Ghanem
Visual Question Answering (VQA) models should have both high robustness and accuracy.
no code implementations • 19 Mar 2017 • Jia-Hong Huang, Modar Alfadly, Bernard Ghanem
Given a natural language question about an image, the first module takes the question as input and then outputs the basic questions of the main given question.