Search Results for author: Chia-Wen Kuo

Found 11 papers, 6 papers with code

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

1 code implementation15 Jun 2024 Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen

However, a large portion of videos in real-world applications are edited videos, \textit{e. g.}, users usually cut and add effects/modifications to the raw video before publishing it on social media platforms.

Question Answering Video Understanding +1

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

1 code implementation9 May 2024 Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen

Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks.

Image Captioning visual instruction following +1

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning

no code implementations CVPR 2023 Chia-Wen Kuo, Zsolt Kira

The image captioning model encodes each view independently with a shared encoder efficiently, and a contrastive loss is incorporated across the encoded views in a novel way to improve their representation quality and the model's data efficiency.

Caption Generation Decoder +1

CLIP-GCD: Simple Language Guided Generalized Category Discovery

no code implementations17 May 2023 Rabah Ouldnoughi, Chia-Wen Kuo, Zsolt Kira

Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data.

Clustering Retrieval

Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation

no code implementations20 Nov 2022 Chia-Wen Kuo, Chih-Yao Ma, Judy Hoffman, Zsolt Kira

In Vision-and-Language Navigation (VLN), researchers typically take an image encoder pre-trained on ImageNet without fine-tuning on the environments that the agent will be trained or tested on.

Test unseen Vision and Language Navigation

Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

1 code implementation CVPR 2022 Chia-Wen Kuo, Zsolt Kira

A key limitation of such methods, however, is that the output of the model is conditioned only on the object detector's outputs.

Image Captioning Object

Unbiased Teacher for Semi-Supervised Object Detection

4 code implementations ICLR 2021 Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda

To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner.

Image Classification Object +4

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

2 code implementations ECCV 2020 Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira

Recent state-of-the-art semi-supervised learning (SSL) methods use a combination of image-based transformations and consistency regularization as core components.

Clustering Data Augmentation +1

Who2com: Collaborative Perception via Learnable Handshake Communication

1 code implementation21 Mar 2020 Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathan Glaser, Chia-Wen Kuo, Zsolt Kira

In this paper, we propose the problem of collaborative perception, where robots can combine their local observations with those of neighboring agents in a learnable way to improve accuracy on a perception task.

Multi-agent Reinforcement Learning Reinforcement Learning +2

Manifold Graph with Learned Prototypes for Semi-Supervised Image Classification

no code implementations12 Jun 2019 Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira

We then show that when combined with these regularizers, the proposed method facilitates the propagation of information from generated prototypes to image data to further improve results.

Classification General Classification +1

Data-Efficient Graph Embedding Learning for PCB Component Detection

no code implementations16 Nov 2018 Chia-Wen Kuo, Jacob Ashmore, David Huggins, Zsolt Kira

This paper presents a challenging computer vision task, namely the detection of generic components on a PCB, and a novel set of deep-learning methods that are able to jointly leverage the appearance of individual components and the propagation of information across the structure of the board to accurately detect and identify various types of components on a PCB.

Graph Embedding object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.