1 code implementation • 15 Jun 2024 • Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen
However, a large portion of videos in real-world applications are edited videos, \textit{e. g.}, users usually cut and add effects/modifications to the raw video before publishing it on social media platforms.
1 code implementation • 9 May 2024 • Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen
Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks.
Ranked #1 on visual instruction following on LLaVA-Bench
no code implementations • CVPR 2023 • Chia-Wen Kuo, Zsolt Kira
The image captioning model encodes each view independently with a shared encoder efficiently, and a contrastive loss is incorporated across the encoded views in a novel way to improve their representation quality and the model's data efficiency.
no code implementations • 17 May 2023 • Rabah Ouldnoughi, Chia-Wen Kuo, Zsolt Kira
Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data.
no code implementations • 20 Nov 2022 • Chia-Wen Kuo, Chih-Yao Ma, Judy Hoffman, Zsolt Kira
In Vision-and-Language Navigation (VLN), researchers typically take an image encoder pre-trained on ImageNet without fine-tuning on the environments that the agent will be trained or tested on.
1 code implementation • CVPR 2022 • Chia-Wen Kuo, Zsolt Kira
A key limitation of such methods, however, is that the output of the model is conditioned only on the object detector's outputs.
Ranked #12 on Image Captioning on COCO Captions
4 code implementations • ICLR 2021 • Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda
To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner.
2 code implementations • ECCV 2020 • Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira
Recent state-of-the-art semi-supervised learning (SSL) methods use a combination of image-based transformations and consistency regularization as core components.
1 code implementation • 21 Mar 2020 • Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathan Glaser, Chia-Wen Kuo, Zsolt Kira
In this paper, we propose the problem of collaborative perception, where robots can combine their local observations with those of neighboring agents in a learnable way to improve accuracy on a perception task.
Multi-agent Reinforcement Learning Reinforcement Learning +2
no code implementations • 12 Jun 2019 • Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira
We then show that when combined with these regularizers, the proposed method facilitates the propagation of information from generated prototypes to image data to further improve results.
no code implementations • 16 Nov 2018 • Chia-Wen Kuo, Jacob Ashmore, David Huggins, Zsolt Kira
This paper presents a challenging computer vision task, namely the detection of generic components on a PCB, and a novel set of deep-learning methods that are able to jointly leverage the appearance of individual components and the propagation of information across the structure of the board to accurately detect and identify various types of components on a PCB.