Search Results for author: Yongxin Wang

Found 14 papers, 5 papers with code

Learning Hierarchical Graph Neural Networks for Image Clustering

2 code implementations ICCV 2021 Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto

Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.

Clustering Face Clustering

Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

1 code implementation23 Jun 2020 Yongxin Wang, Kris Kitani, Xinshuo Weng

Despite the fact that the two components are dependent on each other, prior works often design detection and data association modules separately which are trained with separate objectives.

Multi-Object Tracking Object +2

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning

1 code implementation12 Jun 2020 Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani

As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i. e., object probably with a same ID) and deviate from objects with dissimilar features (i. e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously.

3D Multi-Object Tracking Object

MTAG: Modal-Temporal Attention Graph for Unaligned Human Multimodal Language Sequences

1 code implementation NAACL 2021 Jianing Yang, Yongxin Wang, Ruitao Yi, Yuying Zhu, Azaan Rehman, Amir Zadeh, Soujanya Poria, Louis-Philippe Morency

Human communication is multimodal in nature; it is through multiple modalities such as language, voice, and facial expressions, that opinions and emotions are expressed.

Emotion Recognition Multimodal Sentiment Analysis

Pixel Invisibility: Detecting Objects Invisible in Color Images

no code implementations15 Jun 2020 Yongxin Wang, Duminda Wijesekera

Despite recent success of object detectors using deep neural networks, their deployment on safety-critical applications such as self-driving cars remains questionable.

Knowledge Distillation object-detection +2

What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets

no code implementations7 Jul 2020 Jianing Yang, Yuying Zhu, Yongxin Wang, Ruitao Yi, Amir Zadeh, Louis-Philippe Morency

In this paper, we analyze QA biases in popular video question answering datasets and discover pretrained language models can answer 37-48% questions correctly without using any multimodal context information, far exceeding the 20% random guess baseline for 5-choose-1 multiple-choice questions.

Multiple-choice Question Answering +1

Weakly-Supervised Online Hashing

no code implementations16 Sep 2020 Yu-Wei Zhan, Xin Luo, Yu Sun, Yongxin Wang, Zhen-Duo Chen, Xin-Shun Xu

However, existing hashing methods for social image retrieval are based on batch mode which violates the nature of social images, i. e., social images are usually generated periodically or collected in a stream fashion.

Image Retrieval Retrieval

Semi-TCL: Semi-Supervised Track Contrastive Representation Learning

no code implementations6 Jul 2021 Wei Li, Yuanjun Xiong, Shuo Yang, Mingze Xu, Yongxin Wang, Wei Xia

We design a new instance-to-track matching objective to learn appearance embedding that compares a candidate detection to the embedding of the tracks persisted in the tracker.

Multiple Object Tracking Object +1

ViT-FOD: A Vision Transformer based Fine-grained Object Discriminator

no code implementations24 Mar 2022 Zi-Chao Zhang, Zhen-Duo Chen, Yongxin Wang, Xin Luo, Xin-Shun Xu

Recently, several Vision Transformer (ViT) based methods have been proposed for Fine-Grained Visual Classification (FGVC). These methods significantly surpass existing CNN-based ones, demonstrating the effectiveness of ViT in FGVC tasks. However, there are some limitations when applying ViT directly to FGVC. First, ViT needs to split images into patches and calculate the attention of every pair, which may result in heavy redundant calculation and unsatisfying performance when handling fine-grained images with complex background and small objects. Second, a standard ViT only utilizes the class token in the final layer for classification, which is not enough to extract comprehensive fine-grained information.

Fine-Grained Image Classification

Three-Stream Joint Network for Zero-Shot Sketch-Based Image Retrieval

no code implementations12 Apr 2022 Yu-Wei Zhan, Xin Luo, Yongxin Wang, Zhen-Duo Chen, Xin-Shun Xu

To narrow the domain differences between sketches and images, we extract edge maps for natural images and treat them as a bridge between images and sketches, which have similar content to images and similar style to sketches.

Retrieval Sketch-Based Image Retrieval

Prototype-Based Layered Federated Cross-Modal Hashing

no code implementations27 Oct 2022 Jiale Liu, Yu-Wei Zhan, Xin Luo, Zhen-Duo Chen, Yongxin Wang, Xin-Shun Xu

And due to the problems of statistical heterogeneity, model heterogeneity, and forcing each client to accept the same parameters, applying federated learning to cross-modal hash learning becomes very tricky.

Personalized Federated Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.