Search Results for author: Po-Yao Huang

Found 14 papers, 6 papers with code

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

1 code implementation EMNLP 2021 Hu Xu, Gargi Ghosh, Po-Yao Huang, Dmytro Okhonko, Armen Aghajanyan, Florian Metze, Luke Zettlemoyer, Christoph Feichtenhofer

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks.

Action Segmentation Video Retrieval

Audio-Visual Event Recognition through the lens of Adversary

no code implementations15 Nov 2020 Juncheng B Li, Kaixin Ma, Shuhui Qu, Po-Yao Huang, Florian Metze

This work aims to study several key questions related to multimodal learning through the lens of adversarial noises: 1) The trade-off between early/middle/late fusion affecting its robustness and accuracy 2) How do different frequency/time domain features contribute to the robustness?

Support-set bottlenecks for video-text representation learning

no code implementations ICLR 2021 Mandela Patrick, Po-Yao Huang, Yuki Asano, Florian Metze, Alexander Hauptmann, João Henriques, Andrea Vedaldi

The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes away the representations of all other pairs.

Contrastive Learning Representation Learning +1

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions

no code implementations1 Jun 2020 Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen, Xin Wang

Neural Architecture Search (NAS) is just such a revolutionary algorithm, and the related research work is complicated and rich.

Neural Architecture Search

Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

no code implementations IJCNLP 2019 Po-Yao Huang, Xiaojun Chang, Alexander Hauptmann

With the aim of promoting and understanding the multilingual version of image search, we leverage visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations.

Image Retrieval Object Detection +1

RWR-GAE: Random Walk Regularization for Graph Auto Encoders

1 code implementation12 Aug 2019 Vaibhav, Po-Yao Huang, Robert Frederking

Node embeddings have become an ubiquitous technique for representing graph data in a low dimensional space.

 Ranked #1 on Graph Clustering on Pubmed (NMI metric)

Graph Clustering Link Prediction +1

RCAA: Relational Context-Aware Agents for Person Search

no code implementations ECCV 2018 Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, Alexander G. Hauptmann

In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images.

Person Search

Cannot find the paper you are looking for? You can Submit a new open access paper.