Search Results for author: Ting Yao

Found 64 papers, 21 papers with code

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics

1 code implementation18 Aug 2021 Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, Tao Mei

Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion.

Cross-Modal Retrieval Image Captioning +4

A Low Rank Promoting Prior for Unsupervised Contrastive Learning

no code implementations5 Aug 2021 Yu Wang, Jingyang Lin, Qi Cai, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei

In this paper, we construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning, referred to as LORAC.

Contrastive Learning Image Classification +4

Contextual Transformer Networks for Visual Recognition

3 code implementations26 Jul 2021 Yehao Li, Ting Yao, Yingwei Pan, Tao Mei

Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.

Instance Segmentation Object Detection +1

Representing Videos As Discriminative Sub-Graphs for Action Recognition

no code implementations CVPR 2021 Dong Li, Zhaofan Qiu, Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition.

Action Recognition Graph Learning +1

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

1 code implementation27 Jan 2021 Yehao Li, Yingwei Pan, Ting Yao, Jingwen Chen, Tao Mei

Despite having impressive vision-language (VL) pretraining with BERT-based encoder for VL understanding, the pretraining of a universal encoder-decoder for both VL understanding and generation remains challenging.

ComQA:Compositional Question Answering via Hierarchical Graph Neural Networks

1 code implementation16 Jan 2021 Bingning Wang, Ting Yao, WeiPeng Chen, Jingfang Xu, Xiaochuan Wang

In compositional question answering, the systems should assemble several supporting evidence from the document to generate the final answer, which is more difficult than sentence-level or phrase-level QA.

Answer Selection Machine Reading Comprehension

A Style and Semantic Memory Mechanism for Domain Generalization

no code implementations ICCV 2021 Yang Chen, Yu Wang, Yingwei Pan, Ting Yao, Xinmei Tian, Tao Mei

Correspondingly, we also propose a novel "jury" mechanism, which is particularly effective in learning useful semantic feature commonalities among domains.

Domain Generalization

Condensing a Sequence to One Informative Frame for Video Recognition

no code implementations ICCV 2021 Zhaofan Qiu, Ting Yao, Yan Shu, Chong-Wah Ngo, Tao Mei

This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" and then exploits off-the-shelf image recognition system on the synthetic frame.

Motion Estimation Video Recognition

Optimization Planning for 3D ConvNets

no code implementations1 Jan 2021 Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei

In this paper, we decompose the path into a series of training “states” and specify the hyper-parameters, e. g., learning rate and the length of input clips, in each state.

Video Recognition

Motion-Focused Contrastive Learning of Video Representations

no code implementations ICCV 2021 Rui Li, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei

To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning.

Contrastive Learning Data Augmentation +2

Joint Contrastive Learning with Infinite Possibilities

1 code implementation NeurIPS 2020 Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei

This paper explores useful modifications of the recent development in contrastive learning via novel probabilistic modeling.

Contrastive Learning

Learning to Localize Actions from Moments

1 code implementation ECCV 2020 Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei

In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes.

Action Localization Transfer Learning

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

2 code implementations3 Aug 2020 Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Tao Mei

In this paper, we compose a trilogy of exploring the basic and generic supervision in the sequence from spatial, spatiotemporal and sequential perspectives.

Action Recognition Contrastive Learning +3

Pre-training for Video Captioning Challenge 2020 Summary

no code implementations27 Jul 2020 Yingwei Pan, Jun Xu, Yehao Li, Ting Yao, Tao Mei

The Pre-training for Video Captioning Challenge 2020 Summary: results and challenge participants' technical reports.

Video Captioning

Single Shot Video Object Detector

1 code implementation7 Jul 2020 Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.

Object Detection

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training

no code implementations5 Jul 2020 Yingwei Pan, Yehao Li, Jianjie Luo, Jun Xu, Ting Yao, Tao Mei

In this work, we present Auto-captions on GIF, which is a new large-scale pre-training dataset for generic video understanding.

Question Answering Video Captioning +2

ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion

1 code implementation22 Jun 2020 BingningWang, Ting Yao, Qi Zhang, Jingfang Xu, Xiaochuan Wang

The release of ReCO consists of 300k questions that to our knowledge is the largest in Chinese reading comprehension.

Causal Inference Chinese Reading Comprehension +2

Transferring and Regularizing Prediction for Semantic Segmentation

no code implementations CVPR 2020 Yiheng Zhang, Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Dong Liu, Tao Mei

In the view of extremely expensive expert labeling, recent research has shown that the models trained on photo-realistic synthetic data (e. g., computer games) with computer-generated annotations can be adapted to real images.

Domain Adaptation Semantic Segmentation

Learning a Unified Sample Weighting Network for Object Detection

1 code implementation CVPR 2020 Qi Cai, Yingwei Pan, Yu Wang, Jingen Liu, Ting Yao, Tao Mei

To this end, we devise a general loss function to cover most region-based object detectors with various sampling strategies, and then based on it we propose a unified sample weighting network to predict a sample's task weights.

Classification General Classification +2

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation

no code implementations CVPR 2020 Yingwei Pan, Ting Yao, Yehao Li, Chong-Wah Ngo, Tao Mei

A clustering branch is capitalized on to ensure that the learnt representation preserves such underlying structure by matching the estimated assignment distribution over clusters to the inherent cluster distribution for each target sample.

Unsupervised Domain Adaptation

A Self-Training Method for Machine Reading Comprehension with Soft Evidence Extraction

1 code implementation ACL 2020 Yilin Niu, Fangkai Jiao, Mantong Zhou, Ting Yao, Jingfang Xu, Minlie Huang

Neural models have achieved great success on machine reading comprehension (MRC), many of which typically consist of two components: an evidence extractor and an answer predictor.

Machine Reading Comprehension Multi-Choice MRC +1

X-Linear Attention Networks for Image Captioning

1 code implementation CVPR 2020 Yingwei Pan, Ting Yao, Yehao Li, Tao Mei

Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2$^{nd}$ order interactions across multi-modal inputs.

Fine-Grained Visual Recognition Image Captioning +2

Long Short-Term Relation Networks for Video Action Detection

no code implementations31 Mar 2020 Dong Li, Ting Yao, Zhaofan Qiu, Houqiang Li, Tao Mei

It has been well recognized that modeling human-object or object-object relations would be helpful for detection task.

Action Detection Region Proposal

Three-dimensional cell culture model for hepatocytes opens a new avenue of real world research on liver

no code implementations19 Nov 2019 Ting Yao, Yi Zhang, Mengjiao Lv, Guoqing Zang, Soon Seng Ng, Xiaohua Chen

3-demensional (3D) culture model is a valuable in vitro tool to study liver biology, metabolism, organogenesis, tissue morphology, drug discovery and cell-based assays.

Drug Discovery

Multi-Source Domain Adaptation and Semi-Supervised Domain Adaptation with Focus on Visual Domain Adaptation Challenge 2019

2 code implementations8 Oct 2019 Yingwei Pan, Yehao Li, Qi Cai, Yang Chen, Ting Yao

Semi-Supervised Domain Adaptation: For this task, we adopt a standard self-learning framework to construct a classifier based on the labeled source and target data, and generate the pseudo labels for unlabeled target data.

Domain Adaptation

Scheduled Differentiable Architecture Search for Visual Recognition

no code implementations23 Sep 2019 Zhaofan Qiu, Ting Yao, Yiheng Zhang, Yongdong Zhang, Tao Mei

Moreover, we enlarge the search space of SDAS particularly for video recognition by devising several unique operations to encode spatio-temporal dynamics and demonstrate the impact in affecting the architecture search of SDAS.

Video Recognition

Deep Metric Learning with Density Adaptivity

no code implementations9 Sep 2019 Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

The problem of distance metric learning is mostly considered from the perspective of learning an embedding space, where the distances between pairs of examples are in correspondence with a similarity metric.

Metric Learning

Hierarchy Parsing for Image Captioning

no code implementations ICCV 2019 Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

It is always well believed that parsing an image into constituent visual patterns would be helpful for understanding and representing an image.

Hierarchical structure Image Captioning

Mocycle-GAN: Unpaired Video-to-Video Translation

no code implementations26 Aug 2019 Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, Tao Mei

Unsupervised image-to-image translation is the task of translating an image from one domain to another in the absence of any paired training examples and tends to be more applicable to practical applications.

Motion Estimation Translation +1

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices

1 code implementation16 Aug 2019 Jianhao Zhang, Yingwei Pan, Ting Yao, He Zhao, Tao Mei

It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations.

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

no code implementations1 Aug 2019 Jing Wang, Yingwei Pan, Ting Yao, Jinhui Tang, Tao Mei

A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure.

vireoJD-MM at Activity Detection in Extended Videos

no code implementations20 Jun 2019 Fuchen Long, Qi Cai, Zhaofan Qiu, Zhijian Hou, Yingwei Pan, Ting Yao, Chong-Wah Ngo

This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019.

Action Detection Action Localization +1

Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019

no code implementations14 Jun 2019 Zhaofan Qiu, Dong Li, Yehao Li, Qi Cai, Yingwei Pan, Ting Yao

This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.

Action Recognition Spatio-Temporal Action Localization

Learning Spatio-Temporal Representation with Local and Global Diffusion

no code implementations CVPR 2019 Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xinmei Tian, Tao Mei

Diffusions effectively interact two aspects of information, i. e., localized and holistic, for more powerful way of representation learning.

Action Classification Action Detection +3

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

1 code implementation3 May 2019 Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei

Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations.

Video Captioning

Exploring Object Relation in Mean Teacher for Cross-Domain Detection

1 code implementation CVPR 2019 Qi Cai, Yingwei Pan, Chong-Wah Ngo, Xinmei Tian, Ling-Yu Duan, Ting Yao

The whole architecture is then optimized with three consistency regularizations: 1) region-level consistency to align the region-level predictions between teacher and student, 2) inter-graph consistency for matching the graph structures between teacher and student, and 3) intra-graph consistency to enhance the similarity between regions of same class within the graph of student.

Unsupervised Domain Adaptation

Transferrable Prototypical Networks for Unsupervised Domain Adaptation

no code implementations CVPR 2019 Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, Tao Mei

Specifically, we present Transferrable Prototypical Networks (TPN) for adaptation such that the prototypes for each class in source and target domains are close in the embedding space and the score distributions predicted by prototypes separately on source and target data are similar.

Unsupervised Domain Adaptation

Sogou Machine Reading Comprehension Toolkit

1 code implementation28 Mar 2019 Jindou Wu, Yunlun Yang, Chao Deng, Hongyi Tang, Bingning Wang, Haoze Sun, Ting Yao, Qi Zhang

In this paper, we present a Sogou Machine Reading Comprehension (SMRC) toolkit that can be used to provide the fast and efficient development of modern machine comprehension models, including both published models and original prototypes.

Machine Reading Comprehension

Exploring Visual Relationship for Image Captioning

no code implementations ECCV 2018 Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections.

Image Captioning

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

no code implementations ECCV 2018 Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei

The RTP initializes action proposals of the start frame through a Region Proposal Network and then estimates the movements of proposals in next frame in a recurrent manner.

Action Detection Region Proposal

YH Technologies at ActivityNet Challenge 2018

no code implementations29 Jun 2018 Ting Yao, Xue Li

This notebook paper presents an overview and comparative analysis of our systems designed for the following five tasks in ActivityNet Challenge 2018: temporal action proposals, temporal action localization, dense-captioning events in videos, trimmed action recognition, and spatio-temporal action localization.

Action Recognition Spatio-Temporal Action Localization

Fully Convolutional Adaptation Networks for Semantic Segmentation

no code implementations CVPR 2018 Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei

The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets.

Domain Adaptation Semantic Segmentation

Memory Matching Networks for One-Shot Image Recognition

no code implementations CVPR 2018 Qi Cai, Yingwei Pan, Ting Yao, Chenggang Yan, Tao Mei

In this paper, we introduce the new ideas of augmenting Convolutional Neural Networks (CNNs) with Memory and learning to learn the network parameters for the unlabelled images on the fly in one-shot learning.

One-Shot Learning

Deep Semantic Hashing with Generative Adversarial Networks

no code implementations23 Apr 2018 Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei

Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream.

General Classification Image Retrieval

To Create What You Tell: Generating Videos from Captions

no code implementations23 Apr 2018 Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, Tao Mei

In this paper, we present a novel Temporal GANs conditioning on Captions, namely TGANs-C, in which the input to the generator network is a concatenation of a latent noise vector and caption embedding, and then is transformed into a frame sequence with 3D spatio-temporal convolutions.

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

2 code implementations ICCV 2017 Zhaofan Qiu, Ting Yao, Tao Mei

In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.

Action Recognition

Deep Quantization: Encoding Convolutional Activations with Deep Generative Model

no code implementations CVPR 2017 Zhaofan Qiu, Ting Yao, Tao Mei

In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative model, by training them in an end-to-end manner.

Action Recognition Fine-Grained Image Classification +2

Video Captioning with Transferred Semantic Attributes

no code implementations CVPR 2017 Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community.

Video Captioning

Boosting Image Captioning with Attributes

no code implementations ICCV 2017 Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei

Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing.

Image Captioning

Deep Learning for Video Classification and Captioning

1 code implementation22 Sep 2016 Zuxuan Wu, Ting Yao, Yanwei Fu, Yu-Gang Jiang

Accelerated by the tremendous increase in Internet bandwidth and storage space, video data has been generated, published and spread explosively, becoming an indispensable part of today's big data.

Classification General Classification +2

Highlight Detection With Pairwise Deep Ranking for First-Person Video Summarization

no code implementations CVPR 2016 Ting Yao, Tao Mei, Yong Rui

The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos.

Video Summarization

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

no code implementations CVPR 2016 Jun Xu, Tao Mei, Ting Yao, Yong Rui

In this paper we present MSR-VTT (standing for "ABC-Video to Text") which is a new large-scale video benchmark for video understanding, especially the emerging task of translating video to text.

Image Captioning Video Description +1

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images

no code implementations CVPR 2016 Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei

The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos.

Action Recognition Event Detection

Learning Query and Image Similarities With Ranking Canonical Correlation Analysis

no code implementations ICCV 2015 Ting Yao, Tao Mei, Chong-Wah Ngo

One of the fundamental problems in image search is to learn the ranking functions, i. e., similarity between the query and image.

Image Retrieval

Semi-Supervised Domain Adaptation With Subspace Learning for Visual Recognition

no code implementations CVPR 2015 Ting Yao, Yingwei Pan, Chong-Wah Ngo, Houqiang Li, Tao Mei

In many real-world applications, we are often facing the problem of cross domain learning, i. e., to borrow the labeled data or transfer the already learnt knowledge from a source domain to a target domain.

Domain Adaptation Object Recognition

Jointly Modeling Embedding and Translation to Bridge Video and Language

no code implementations CVPR 2016 Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui

Our proposed LSTM-E consists of three components: a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep RNN for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.