no code implementations • 2 Dec 2024 • Alberto Gonzalo Rodriguez Salgado, Maying Schen, Philipp Harzig, Peter Mayer, Jose M. Alvarez
Robustness to out-of-distribution data is crucial for deploying modern neural networks.
no code implementations • 28 Dec 2021 • Philipp Harzig, Moritz Einfalt, Rainer Lienhart
Video-to-Text (VTT) is the task of automatically generating descriptions for short audio-visual video clips, which can support visually impaired people to understand scenes of a YouTube video for instance.
no code implementations • 28 Dec 2021 • Philipp Harzig, Moritz Einfalt, Katja Ludwig, Rainer Lienhart
For both models, we train on the complete VATEX dataset and 90% of the TRECVID-VTT dataset for pretraining while using the remaining 10% for validation.
no code implementations • 6 Aug 2019 • Philipp Harzig, Yan-Ying Chen, Francine Chen, Rainer Lienhart
Automatic medical report generation from chest X-ray images is one possibility for assisting doctors to reduce their workload.
1 code implementation • 6 May 2019 • Philipp Harzig, Dan Zecha, Rainer Lienhart, Carolin Kaiser, René Schallner
Furthermore, we introduce a novel metric that allows us to assess whether the generated captions meet our requirements (i. e., subject, predicate, object, and product name) and describe a series of experiments on caption quality and how to address annotator disagreements for the image ratings with an approach called soft targets.
no code implementations • 6 Feb 2018 • Philipp Harzig, Stephan Brehm, Rainer Lienhart, Carolin Kaiser, René Schallner
Thanks to adding the third output modality, it also considerably improves the quality of generated captions for images depicting branded products.