RGB-T Tracking Based on Mixed Attention

no code implementations9 Apr 2023 Yang Luo, Xiqing Guo, Mingtao Dong, Jin Yu

An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities (referred to as MACFT) is proposed in this paper.

Rgb-T Tracking

HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints

no code implementations16 Jul 2022 Wei Wu, Junlin He, Yu Qiao, Guoheng Fu, Li Liu, Jin Yu

The in-memory approximate nearest neighbor search (ANNS) algorithms have achieved great success for fast high-recall query processing, but are extremely inefficient when handling hybrid queries with unstructured (i. e., feature vectors) and structured (i. e., related attributes) constraints.

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

1 code implementation6 May 2022 Yuan Gong, Jin Yu, James Glass

Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring.

Audio Classification

Cross-domain User Preference Learning for Cold-start Recommendation

no code implementations7 Dec 2021 Huiling Zhou, Jie Liu, Zhikang Li, Jin Yu, Hongxia Yang

With user history represented by a domain-aware sequential model, a frequency encoder is applied to the underlying tags for user content preference learning.

Recommendation Systems

Stable Prediction on Graphs with Agnostic Distribution Shift

no code implementations8 Oct 2021 Shengyu Zhang, Kun Kuang, Jiezhong Qiu, Jin Yu, Zhou Zhao, Hongxia Yang, Zhongfei Zhang, Fei Wu

The results demonstrate that our method outperforms various SOTA GNNs for stable prediction on graphs with agnostic distribution shift, including shift caused by node labels and attributes.

Graph Learning Recommendation Systems +1

M6: A Chinese Multimodal Pretrainer

no code implementations1 Mar 2021 Junyang Lin, Rui Men, An Yang, Chang Zhou, Ming Ding, Yichang Zhang, Peng Wang, Ang Wang, Le Jiang, Xianyan Jia, Jie Zhang, Jianwei Zhang, Xu Zou, Zhikang Li, Xiaodong Deng, Jie Liu, Jinbao Xue, Huiling Zhou, Jianxin Ma, Jin Yu, Yong Li, Wei Lin, Jingren Zhou, Jie Tang, Hongxia Yang

In this work, we construct the largest dataset for multimodal pretraining in Chinese, which consists of over 1. 9TB images and 292GB texts that cover a wide range of domains.

Image Generation

DeVLBert: Learning Deconfounded Visio-Linguistic Representations

1 code implementation16 Aug 2020 Shengyu Zhang, Tan Jiang, Tan Wang, Kun Kuang, Zhou Zhao, Jianke Zhu, Jin Yu, Hongxia Yang, Fei Wu

In this paper, we propose to investigate the problem of out-of-domain visio-linguistic pretraining, where the pretraining data distribution differs from that of downstream data on which the pretrained model will be fine-tuned.

Image Retrieval Question Answering +2

Poet: Product-oriented Video Captioner for E-commerce

1 code implementation16 Aug 2020 Shengyu Zhang, Ziqi Tan, Jin Yu, Zhou Zhao, Kun Kuang, Jie Liu, Jingren Zhou, Hongxia Yang, Fei Wu

Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics.

Video Captioning

A Multi-Semantic Metapath Model for Large Scale Heterogeneous Network Representation Learning

no code implementations19 Jul 2020 Xuandong Zhao, Jinbao Xue, Jin Yu, Xi Li, Hongxia Yang

In real-world applications, networks usually consist of billions of various types of nodes and edges with abundant attributes.

Link Prediction Network Embedding

Comprehensive Information Integration Modeling Framework for Video Titling

1 code implementation24 Jun 2020 Shengyu Zhang, Ziqi Tan, Jin Yu, Zhou Zhao, Kun Kuang, Tan Jiang, Jingren Zhou, Hongxia Yang, Fei Wu

In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume.

Descriptive Video Captioning

Grounded and Controllable Image Completion by Incorporating Lexical Semantics

no code implementations29 Feb 2020 Shengyu Zhang, Tan Jiang, Qinghao Huang, Ziqi Tan, Zhou Zhao, Siliang Tang, Jin Yu, Hongxia Yang, Yi Yang, Fei Wu

Existing image completion procedure is highly subjective by considering only visual context, which may trigger unpredictable results which are plausible but not faithful to a grounded knowledge.

Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

no code implementations NeurIPS 2011 Trung T. Pham, Tat-Jun Chin, Jin Yu, David Suter

Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion.

Model Selection

