no code implementations • CVPR 2024 • Hugues Thomas, Yao-Hung Hubert Tsai, Timothy D. Barfoot, Jian Zhang
In the field of deep point cloud understanding, KPConv is a unique architecture that uses kernel points to locate convolutional weights in space, instead of relying on Multi-Layer Perceptron (MLP) encodings.
Ranked #5 on Semantic Segmentation on S3DIS Area5
no code implementations • 1 Feb 2024 • Yao-Hung Hubert Tsai, Walter Talbott, Jian Zhang
This paper focuses on decision planning with uncertainty estimation to address the hallucination problem in language models.
no code implementations • 16 Oct 2023 • Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai
We find that the model performance depends on the combination of TWD and probability model, and that the Jeffrey divergence regularization helps in model training.
no code implementations • 12 Oct 2023 • Yao-Hung Hubert Tsai, Vansh Dhar, Jialu Li, BoWen Zhang, Jian Zhang
Recent efforts to enable visual navigation using large language models have mainly focused on developing complex prompt systems.
no code implementations • 9 Dec 2022 • So Yeon Min, Yao-Hung Hubert Tsai, Wei Ding, Ali Farhadi, Ruslan Salakhutdinov, Yonatan Bisk, Jian Zhang
In contrast, our LocCon shows the most robust transfer in the real world among the set of models we compare to, and that the real-world performance of all models can be further improved with self-supervised LocCon in-situ training.
no code implementations • 22 Oct 2022 • Runxiang Cheng, Gargi Balasubramaniam, Yifei He, Yao-Hung Hubert Tsai, Han Zhao
We formulate a theoretical framework for optimizing modality selection in multimodal learning and introduce a utility measure to quantify the benefit of selecting a modality.
no code implementations • 27 Sep 2022 • Yao-Hung Hubert Tsai, Hanlin Goh, Ali Farhadi, Jian Zhang
The perception system in personalized mobile agents requires developing indoor scene understanding models, which can understand 3D geometries, capture objectiveness, analyze human behaviors, etc.
no code implementations • 25 Sep 2022 • Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Ruslan Salakhutdinov, Louis-Philippe Morency, Yu-Chiang Frank Wang
Since no ground truth captions are available for novel object images during training, our P2C leverages cross-modality (image-text) association modules to ensure the above caption characteristics can be properly preserved.
1 code implementation • 22 Jul 2022 • Peter Naylor, Yao-Hung Hubert Tsai, Marick Laé, Makoto Yamada
Recent developments in self-supervised learning give us the possibility to further reduce human intervention in multi-step pipelines where the focus evolves around particular objects of interest.
1 code implementation • ICLR 2022 • Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency
The first stage is to cluster data according to its auxiliary information.
1 code implementation • ICLR 2022 • Yao-Hung Hubert Tsai, Tianqin Li, Martin Q. Ma, Han Zhao, Kun Zhang, Louis-Philippe Morency, Ruslan Salakhutdinov
Conditional contrastive learning frameworks consider the conditional sampling procedure that constructs positive or negative data pairs conditioned on specific variables.
no code implementations • 29 Sep 2021 • Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Yu-Chiang Frank Wang, Louis-Philippe Morency, Ruslan Salakhutdinov
Novel object captioning (NOC) learns image captioning models for describing objects or visual concepts which are unseen (i. e., novel) in the training captions.
10 code implementations • 14 Jun 2021 • Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed
Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation.
Ranked #5 on Speech Recognition on LibriSpeech test-other
no code implementations • 5 Jun 2021 • Martin Q. Ma, Yao-Hung Hubert Tsai, Paul Pu Liang, Han Zhao, Kun Zhang, Ruslan Salakhutdinov, Louis-Philippe Morency
In this paper, we propose a Conditional Contrastive Learning (CCL) approach to improve the fairness of contrastive SSL methods.
no code implementations • 5 Jun 2021 • Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency
Our approach contributes as follows: 1) Comparing to conventional self-supervised representations, the auxiliary-information-infused self-supervised representations bring the performance closer to the supervised representations; 2) The presented Cl-InfoNCE can also work with unsupervised constructed clusters (e. g., k-means clusters) and outperform strong clustering-based self-supervised learning approaches, such as the Prototypical Contrastive Learning (PCL) method; 3) We show that Cl-InfoNCE may be a better approach to leverage the data clustering information, by comparing it to the baseline approach - learning to predict the clustering assignments with cross-entropy loss.
2 code implementations • 28 Apr 2021 • Yao-Hung Hubert Tsai, Shaojie Bai, Louis-Philippe Morency, Ruslan Salakhutdinov
In this report, we relate the algorithmic design of Barlow Twins' method to the Hilbert-Schmidt Independence Criterion (HSIC), thus establishing it as a contrastive learning approach that is free of negative samples.
1 code implementation • ICLR 2021 • Yao-Hung Hubert Tsai, Martin Q. Ma, Muqiao Yang, Han Zhao, Louis-Philippe Morency, Ruslan Salakhutdinov
This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance.
no code implementations • 1 Jan 2021 • Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada
To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence.
no code implementations • 27 Oct 2020 • Shangda Li, Devendra Singh Chaplot, Yao-Hung Hubert Tsai, Yue Wu, Louis-Philippe Morency, Ruslan Salakhutdinov
We further show that our method can be used to transfer the navigation policies learned in simulation to the real world.
1 code implementation • ICLR 2021 • Yao-Hung Hubert Tsai, Yue Wu, Ruslan Salakhutdinov, Louis-Philippe Morency
In particular, we propose a composite objective that bridges the gap between prior contrastive and predictive learning objectives, and introduce an additional objective term to discard task-irrelevant information.
1 code implementation • NeurIPS 2020 • Yao-Hung Hubert Tsai, Han Zhao, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov
Since its inception, the neural estimation of mutual information (MI) has demonstrated the empirical success of modeling expected dependency between high-dimensional random variables.
1 code implementation • 25 May 2020 • Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada
To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence.
1 code implementation • EMNLP 2020 • Yao-Hung Hubert Tsai, Martin Q. Ma, Muqiao Yang, Ruslan Salakhutdinov, Louis-Philippe Morency
The human language can be expressed through multiple sources of information known as modalities, including tones of voice, facial gestures, and spoken language.
2 code implementations • ICLR 2020 • Yao-Hung Hubert Tsai, Nitish Srivastava, Hanlin Goh, Ruslan Salakhutdinov
We introduce a new routing algorithm for capsule networks, in which a child capsule is routed to a parent based only on agreement between the parent's state and the child's vote.
no code implementations • IJCNLP 2019 • Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov
This new formulation gives us a better way to understand individual components of the Transformer{'}s attention, such as the better way to integrate the positional embedding.
1 code implementation • 22 Oct 2019 • Muqiao Yang, Martin Q. Ma, Dongyu Li, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov
While deep learning has received a surge of interest in a variety of fields in recent years, major deep learning models barely use complex numbers.
Ranked #2 on Music Transcription on MusicNet
1 code implementation • 5 Sep 2019 • Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang
To estimate the mutual information from data, a common practice is preparing a set of paired samples $\{(\mathbf{x}_i,\mathbf{y}_i)\}_{i=1}^n \stackrel{\mathrm{i. i. d.
1 code implementation • EMNLP 2019 • Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov
This new formulation gives us a better way to understand individual components of the Transformer's attention, such as the better way to integrate the positional embedding.
1 code implementation • NeurIPS 2019 • Han Zhao, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Geoffrey J. Gordon
Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis.
no code implementations • ACL 2019 • Paul Pu Liang, Zhun Liu, Yao-Hung Hubert Tsai, Qibin Zhao, Ruslan Salakhutdinov, Louis-Philippe Morency
Our method is based on the observation that high-dimensional multimodal time series data often exhibit correlations across time and modalities which leads to low-rank tensor representations.
4 code implementations • ACL 2019 • Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, Ruslan Salakhutdinov
Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors.
Ranked #6 on Multimodal Sentiment Analysis on MOSI
1 code implementation • NAACL 2019 • Paul Pu Liang, Yao Chong Lim, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Louis-Philippe Morency
Human language is a rich multimodal signal consisting of spoken words, facial expressions, body gestures, and vocal intonations.
no code implementations • ICLR 2019 • Han Zhao, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Geoff Gordon
Learning deep neural networks could be understood as the combination of representation learning and learning halfspaces.
1 code implementation • CVPR 2019 • Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov, Ali Farhadi
Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts.
2 code implementations • ICLR 2019 • Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, Ruslan Salakhutdinov
Multimodal discriminative factors are shared across all modalities and contain joint multimodal features required for discriminative tasks such as sentiment prediction.
no code implementations • ICLR 2019 • Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Ichiro Takeuchi, Ruslan Salakhutdinov, Kenji Fukumizu
In the paper, we propose a post selection inference (PSI) framework for divergence measure, which can select a set of statistically significant features that discriminate two distributions.
no code implementations • 15 Feb 2018 • Yao-Hung Hubert Tsai, Makoto Yamada, Denny Wu, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu
"Which Generative Adversarial Networks (GANs) generates the most plausible images?"
no code implementations • 15 Feb 2018 • Denny Wu, Yixiu Zhao, Yao-Hung Hubert Tsai, Makoto Yamada, Ruslan Salakhutdinov
To address this issue, we propose to measure the dependency instead of MI between layers in DNNs.
no code implementations • ICLR 2018 • Yao-Hung Hubert Tsai, Han Zhao, Nebojsa Jojic, Ruslan Salakhutdinov
The assumption that data samples are independently identically distributed is the backbone of many learning algorithms.
no code implementations • ICLR 2018 • Yao-Hung Hubert Tsai, Han Zhao, Ruslan Salakhutdinov, Nebojsa Jojic
In this technical report, we introduce OrderNet that can be used to extract the order of data instances in an unsupervised way.
no code implementations • 23 Oct 2017 • Yao-Hung Hubert Tsai, Ruslan Salakhutdinov
We introduce two statistical approaches for fusing side information into data representation learning to improve one-shot learning.
no code implementations • 7 Jun 2017 • Chih-Kuan Yeh, Yao-Hung Hubert Tsai, Yu-Chiang Frank Wang
In other words, our GDVM casts the supervised learning task as a generative learning process, with data discrimination to be jointly exploited for improved classification.
no code implementations • ICCV 2017 • Yao-Hung Hubert Tsai, Liang-Kang Huang, Ruslan Salakhutdinov
Many of the existing methods for learning joint embedding of images and text use only supervised information from paired images and its textual attributes.
Ranked #5 on Generalized Few-Shot Learning on CUB
no code implementations • CVPR 2016 • Yao-Hung Hubert Tsai, Yi-Ren Yeh, Yu-Chiang Frank Wang
With the goal of deriving a domain-invariant feature subspace for HDA, our CDLS is able to identify representative cross-domain data, including the unlabeled ones in the target domain, for performing adaptation.
no code implementations • ICCV 2015 • Tzu Ming Harry Hsu, Wei Yu Chen, Cheng-An Hou, Yao-Hung Hubert Tsai, Yi-Ren Yeh, Yu-Chiang Frank Wang
For standard unsupervised domain adaptation, one typically obtains labeled data in the source domain and only observes unlabeled data in the target domain.