Search Results for author: Ting Yao

Found 121 papers, 50 papers with code

Contextual Transformer Networks for Visual Recognition

7 code implementations • 26 Jul 2021 • Yehao Li, Ting Yao, Yingwei Pan, Tao Mei

Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.

Ranked #288 on Image Classification on ImageNet

Image Classification Instance Segmentation +3

10,805

Paper
Code

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

2 code implementations • 11 Jul 2022 • Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, Tao Mei

Motivated by the wavelet theory, we construct a new Wavelet Vision Transformer (\textbf{Wave-ViT}) that formulates the invertible down-sampling with wavelet transforms and self-attention learning in a unified way.

Ranked #209 on Image Classification on ImageNet

Image Classification Instance Segmentation +4

2,972

Paper
Code

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics

2 code implementations • 18 Aug 2021 • Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, Tao Mei

Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion.

Cross-Modal Retrieval Image Captioning +5

1,001

Paper
Code

Comprehending and Ordering Semantics for Image Captioning

1 code implementation • CVPR 2022 • Yehao Li, Yingwei Pan, Ting Yao, Tao Mei

In this paper, we propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net), that novelly unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture.

Cross-Modal Retrieval Image Captioning +2

1,001

Paper
Code

Semantic-Conditional Diffusion Networks for Image Captioning

1 code implementation • CVPR 2023 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Jianlin Feng, Hongyang Chao, Tao Mei

The rich semantics are further regarded as semantic prior to trigger the learning of Diffusion Transformer, which produces the output sentence in a diffusion process.

Cross-Modal Retrieval Image Captioning +3

1,001

Paper
Code

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices

1 code implementation • 16 Aug 2019 • Jianhao Zhang, Yingwei Pan, Ting Yao, He Zhao, Tao Mei

It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations.

757

Paper
Code

Sogou Machine Reading Comprehension Toolkit

1 code implementation • 28 Mar 2019 • Jindou Wu, Yunlun Yang, Chao Deng, Hongyi Tang, Bingning Wang, Haoze Sun, Ting Yao, Qi Zhang

In this paper, we present a Sogou Machine Reading Comprehension (SMRC) toolkit that can be used to provide the fast and efficient development of modern machine comprehension models, including both published models and original prototypes.

Machine Reading Comprehension

744

Paper
Code

Relation Distillation Networks for Video Object Detection

2 code implementations • ICCV 2019 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context.

Object object-detection +3

562

Paper
Code

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

2 code implementations • ICCV 2017 • Zhaofan Qiu, Ting Yao, Tao Mei

In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.

Ranked #7 on Action Recognition on Sports-1M

Action Recognition Philosophy +2

450

Paper
Code

X-Linear Attention Networks for Image Captioning

2 code implementations • CVPR 2020 • Yingwei Pan, Ting Yao, Yehao Li, Tao Mei

Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2$^{nd}$ order interactions across multi-modal inputs.

Ranked #21 on Image Captioning on COCO Captions

Fine-Grained Visual Recognition Image Captioning +3

268

Paper
Code

Dual Vision Transformer

1 code implementation • 11 Jul 2022 • Ting Yao, Yehao Li, Yingwei Pan, Yu Wang, Xiao-Ping Zhang, Tao Mei

Dual-ViT is henceforth able to reduce the computational complexity without compromising much accuracy.

180

Paper
Code

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

3 code implementations • 3 Aug 2020 • Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Tao Mei

In this paper, we compose a trilogy of exploring the basic and generic supervision in the sequence from spatial, spatiotemporal and sequential perspectives.

Action Recognition Contrastive Learning +3

152

Paper
Code

T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

1 code implementation • 7 Apr 2023 • Xiaohui Xie, Qian Dong, Bingning Wang, Feiyang Lv, Ting Yao, Weinan Gan, Zhijing Wu, Xiangsheng Li, Haitao Li, Yiqun Liu, Jin Ma

T2Ranking comprises more than 300K queries and over 2M unique passages from real-world search engines.

Passage Ranking Passage Re-Ranking +3

139

Paper
Code

Learning a Unified Sample Weighting Network for Object Detection

1 code implementation • CVPR 2020 • Qi Cai, Yingwei Pan, Yu Wang, Jingen Liu, Ting Yao, Tao Mei

To this end, we devise a general loss function to cover most region-based object detectors with various sampling strategies, and then based on it we propose a unified sample weighting network to predict a sample's task weights.

General Classification Object +3

Paper
Code

3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention

1 code implementation • CVPR 2023 • Zhenhua Tang, Zhaofan Qiu, Yanbin Hao, Richang Hong, Ting Yao

On this basis, we devise STCFormer by stacking multiple STC blocks and further integrate a new Structure-enhanced Positional Embedding (SPE) into STCFormer to take the structure of human body into consideration.

Ranked #6 on 3D Human Pose Estimation on MPI-INF-3DHP

3D Human Pose Estimation

Paper
Code

Generalized One-shot Domain Adaptation of Generative Adversarial Networks

2 code implementations • 8 Sep 2022 • ZiCheng Zhang, Yinglu Liu, Congying Han, Tiande Guo, Ting Yao, Tao Mei

While previous works mainly focus on style transfer, we propose a novel and concise framework to address the \textit{generalized one-shot adaptation} task for both style and entity transfer, in which a reference image and its binary entity mask are provided.

Domain Adaptation Generative Adversarial Network +1

Paper
Code

Multi-Source Domain Adaptation and Semi-Supervised Domain Adaptation with Focus on Visual Domain Adaptation Challenge 2019

2 code implementations • 8 Oct 2019 • Yingwei Pan, Yehao Li, Qi Cai, Yang Chen, Ting Yao

Semi-Supervised Domain Adaptation: For this task, we adopt a standard self-learning framework to construct a classifier based on the labeled source and target data, and generate the pseudo labels for unlabeled target data.

Domain Adaptation Self-Learning +1

Paper
Code

Lightweight and Progressively-Scalable Networks for Semantic Segmentation

1 code implementation • 27 Jul 2022 • Yiheng Zhang, Ting Yao, Zhaofan Qiu, Tao Mei

In this paper, we thoroughly analyze the design of convolutional blocks (the type of convolutions and the number of channels in convolutions), and the ways of interactions across multiple scales, all from lightweight standpoint for semantic segmentation.

Segmentation Semantic Segmentation

Paper
Code

Exploring Object Relation in Mean Teacher for Cross-Domain Detection

1 code implementation • CVPR 2019 • Qi Cai, Yingwei Pan, Chong-Wah Ngo, Xinmei Tian, Ling-Yu Duan, Ting Yao

The whole architecture is then optimized with three consistency regularizations: 1) region-level consistency to align the region-level predictions between teacher and student, 2) inter-graph consistency for matching the graph structures between teacher and student, and 3) intra-graph consistency to enhance the similarity between regions of same class within the graph of student.

Relation Unsupervised Domain Adaptation

Paper
Code

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection

1 code implementation • CVPR 2022 • Yong Zhang, Yingwei Pan, Ting Yao, Rui Huang, Tao Mei, Chang-Wen Chen

Such design decomposes the process of HOI set prediction into two subsequent phases, i. e., an interaction proposal generation is first performed, and then followed by transforming the non-parametric interaction proposals into HOI predictions via a structure-aware Transformer.

Ranked #3 on Human-Object Interaction Detection on V-COCO

Human-Object Interaction Detection Object

Paper
Code

Joint Contrastive Learning with Infinite Possibilities

1 code implementation • NeurIPS 2020 • Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei

This paper explores useful modifications of the recent development in contrastive learning via novel probabilistic modeling.

Contrastive Learning

Paper
Code

A Self-Training Method for Machine Reading Comprehension with Soft Evidence Extraction

1 code implementation • ACL 2020 • Yilin Niu, Fangkai Jiao, Mantong Zhou, Ting Yao, Jingfang Xu, Minlie Huang

Neural models have achieved great success on machine reading comprehension (MRC), many of which typically consist of two components: an evidence extractor and an answer predictor.

Machine Reading Comprehension Multi-Choice MRC +1

Paper
Code

Stand-Alone Inter-Frame Attention in Video Models

1 code implementation • CVPR 2022 • Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Jiebo Luo, Tao Mei

In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location.

Ranked #13 on Action Recognition on Something-Something V1

Action Classification Action Recognition +1

Paper
Code

ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion

1 code implementation • 22 Jun 2020 • BingningWang, Ting Yao, Qi Zhang, Jingfang Xu, Xiaochuan Wang

The release of ReCO consists of 300k questions that to our knowledge is the largest in Chinese reading comprehension.

Causal Inference Chinese Reading Comprehension +3

Paper
Code

3D Cascade RCNN: High Quality Object Detection in Point Clouds

1 code implementation • 15 Nov 2022 • Qi Cai, Yingwei Pan, Ting Yao, Tao Mei

Recent progress on 2D object detection has featured Cascade RCNN, which capitalizes on a sequence of cascade detectors to progressively improve proposal quality, towards high-quality object detection.

3D Object Detection Object +2

Paper
Code

ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

1 code implementation • 5 Aug 2022 • Bingning Wang, Feiyang Lv, Ting Yao, Yiming Yuan, Jin Ma, Yu Luo, Haijin Liang

However, in most of the public visual question answering datasets such as VQA, CLEVR, the questions are human generated that specific to the given image, such as `What color are her eyes?'.

Image Retrieval Question Answering +2

Paper
Code

Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space

1 code implementation • CVPR 2023 • Yong Zhang, Yingwei Pan, Ting Yao, Rui Huang, Tao Mei, Chang-Wen Chen

Specifically, cheap scene graph supervision data can be easily obtained by parsing image language descriptions into semantic graphs.

Graph Generation object-detection +3

Paper
Code

3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models

1 code implementation • 9 Nov 2023 • Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Tao Mei

In this work, we propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models.

Image Generation

Paper
Code

Single Shot Video Object Detector

1 code implementation • 7 Jul 2020 • Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei

Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos.

Object object-detection +2

Paper
Code

PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering

1 code implementation • CVPR 2023 • Fuchen Long, Ting Yao, Zhaofan Qiu, Lusong Li, Tao Mei

Feature invariance under different data transformations, i. e., transformation invariance, can be regarded as a type of self-supervision for representation learning.

Clustering Deep Clustering +4

Paper
Code

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation

1 code implementation • 13 Jun 2022 • Yingwei Pan, Yehao Li, Yiheng Zhang, Qi Cai, Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei

This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories.

Imitation Learning

Paper
Code

AnchorFormer: Point Cloud Completion From Discriminative Nodes

1 code implementation • CVPR 2023 • Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei

Point cloud completion aims to recover the completed 3D shape of an object from its partial observation.

MORPH Object +1

Paper
Code

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

1 code implementation • 14 Dec 2021 • Jingyang Lin, Yingwei Pan, Rongfeng Lai, Xuehang Yang, Hongyang Chao, Ting Yao

In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module, to mitigate that issue.

Relation Relational Reasoning +2

Paper
Code

Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation

1 code implementation • CVPR 2023 • Sun-Ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao

POP builds a set of orthogonal prototypes, each of which represents a semantic class, and makes the prediction for each class separately based on the features projected onto its prototype.

Ranked #1 on Generalized Few-Shot Semantic Segmentation on COCO-20i (1-shot)

Generalized Few-Shot Semantic Segmentation

Paper
Code

Learning to Localize Actions from Moments

1 code implementation • ECCV 2020 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei

In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes.

Action Localization Transfer Learning

Paper
Code

Dynamic Temporal Filtering in Video Models

1 code implementation • 15 Nov 2022 • Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Chong-Wah Ngo, Tao Mei

The pre-determined kernel size severely limits the temporal receptive fields and the fixed weights treat each spatial location across frames equally, resulting in sub-optimal solution for long-range temporal modeling in natural scenes.

Paper
Code

CARIS: Context-Augmented Referring Image Segmentation

1 code implementation • ACM MM 2023 • Sun-Ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao

Technically, CARIS develops a context-aware mask decoder with sequential bidirectional cross-modal attention to integrate the linguistic features with visual context, which are then aligned with pixel-wise visual features.

Image Segmentation Segmentation +1

Paper
Code

ComQA:Compositional Question Answering via Hierarchical Graph Neural Networks

1 code implementation • 16 Jan 2021 • Bingning Wang, Ting Yao, WeiPeng Chen, Jingfang Xu, Xiaochuan Wang

In compositional question answering, the systems should assemble several supporting evidence from the document to generate the final answer, which is more difficult than sentence-level or phrase-level QA.

Answer Selection Machine Reading Comprehension +2

Paper
Code

Motion-Focused Contrastive Learning of Video Representations

1 code implementation • ICCV 2021 • Rui Li, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei

To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning.

Contrastive Learning Data Augmentation +2

Paper
Code

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

1 code implementation • 3 May 2019 • Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei

Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations.

Sentence Video Captioning

Paper
Code

SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement

1 code implementation • 15 Nov 2022 • Zhaofan Qiu, Yehao Li, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei

In this paper, we propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net.

Paper
Code

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

1 code implementation • 27 Jan 2021 • Yehao Li, Yingwei Pan, Ting Yao, Jingwen Chen, Tao Mei

Despite having impressive vision-language (VL) pretraining with BERT-based encoder for VL understanding, the pretraining of a universal encoder-decoder for both VL understanding and generation remains challenging.

Paper
Code

Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization

1 code implementation • 26 Sep 2022 • Jingyang Lin, Yu Wang, Qi Cai, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei

Existing works attempt to solve the problem by explicitly imposing uncertainty on classifiers when OOD inputs are exposed to the classifier during training.

Outlier Detection Out-of-Distribution Detection +1

Paper
Code

Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration

1 code implementation • NeurIPS 2021 • Yu Wang, Jingyang Lin, Jingjing Zou, Yingwei Pan, Ting Yao, Tao Mei

Our work reveals a structured shortcoming of the existing mainstream self-supervised learning methods.

Self-Supervised Learning

Paper
Code

Multi-Lingual Question Generation with Language Agnostic Language Model

1 code implementation • Findings (ACL) 2021 • Bingning Wang, Ting Yao, WeiPeng Chen, Jingfang Xu, Xiaochuan Wang

Language Modelling Question Generation +1

Paper
Code

Gaussian Temporal Awareness Networks for Action Localization

1 code implementation • CVPR 2019 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei

Temporally localizing actions in a video is a fundamental challenge in video understanding.

Action Localization object-detection +2

Paper
Code

DPTDR: Deep Prompt Tuning for Dense Passage Retrieval

1 code implementation • COLING 2022 • Zhengyang Tang, Benyou Wang, Ting Yao

We believe this work facilitates the industry, as it saves enormous efforts and costs of deployment and increases the utility of computing resources.

Language Modelling Natural Questions +2

Paper
Code

Optimization Planning for 3D ConvNets

1 code implementation • 11 Jan 2022 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei

In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e. g., learning rate and the length of input clips, in each state.

Video Recognition

Paper
Code

Fully Convolutional Adaptation Networks for Semantic Segmentation

no code implementations • CVPR 2018 • Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei

The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets.

Domain Adaptation Semantic Segmentation

Paper
Add Code

Memory Matching Networks for One-Shot Image Recognition

no code implementations • CVPR 2018 • Qi Cai, Yingwei Pan, Ting Yao, Chenggang Yan, Tao Mei

In this paper, we introduce the new ideas of augmenting Convolutional Neural Networks (CNNs) with Memory and learning to learn the network parameters for the unlabelled images on the fly in one-shot learning.

One-Shot Learning Philosophy

Paper
Add Code

Deep Semantic Hashing with Generative Adversarial Networks

no code implementations • 23 Apr 2018 • Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei

Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream.

General Classification Image Retrieval +1

Paper
Add Code

Jointly Localizing and Describing Events for Dense Video Captioning

no code implementations • CVPR 2018 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

A valid question is how to temporally localize and then describe events, which is known as "dense video captioning."

Attribute Dense Video Captioning +3

Paper
Add Code

To Create What You Tell: Generating Videos from Captions

no code implementations • 23 Apr 2018 • Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, Tao Mei

In this paper, we present a novel Temporal GANs conditioning on Captions, namely TGANs-C, in which the input to the generator network is a concatenation of a latent noise vector and caption embedding, and then is transformed into a frame sequence with 3D spatio-temporal convolutions.

Philosophy

Paper
Add Code

Deep Learning for Video Classification and Captioning

1 code implementation • 22 Sep 2016 • Zuxuan Wu, Ting Yao, Yanwei Fu, Yu-Gang Jiang

Accelerated by the tremendous increase in Internet bandwidth and storage space, video data has been generated, published and spread explosively, becoming an indispensable part of today's big data.

Classification General Classification +3

Paper
Code

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

no code implementations • CVPR 2017 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

Image captioning often requires a large set of training image-sentence pairs.

Image Captioning Object Recognition +1

Paper
Add Code

Deep Quantization: Encoding Convolutional Activations with Deep Generative Model

no code implementations • CVPR 2017 • Zhaofan Qiu, Ting Yao, Tao Mei

In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative model, by training them in an end-to-end manner.

Action Recognition Fine-Grained Image Classification +3

Paper
Add Code

Video Captioning with Transferred Semantic Attributes

no code implementations • CVPR 2017 • Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community.

Sentence Video Captioning

Paper
Add Code

Boosting Image Captioning with Attributes

no code implementations • ICCV 2017 • Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei

Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing.

Image Captioning

Paper
Add Code

Jointly Modeling Embedding and Translation to Bridge Video and Language

no code implementations • CVPR 2016 • Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui

Our proposed LSTM-E consists of three components: a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep RNN for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

Sentence Translation

Paper
Add Code

YH Technologies at ActivityNet Challenge 2018

no code implementations • 29 Jun 2018 • Ting Yao, Xue Li

This notebook paper presents an overview and comparative analysis of our systems designed for the following five tasks in ActivityNet Challenge 2018: temporal action proposals, temporal action localization, dense-captioning events in videos, trimmed action recognition, and spatio-temporal action localization.

Action Recognition Dense Captioning +2

Paper
Add Code

Exploring Visual Relationship for Image Captioning

no code implementations • ECCV 2018 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections.

Image Captioning Sentence

Paper
Add Code

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

no code implementations • ECCV 2018 • Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei

The RTP initializes action proposals of the start frame through a Region Proposal Network and then estimates the movements of proposals in next frame in a recurrent manner.

Action Detection Region Proposal

Paper
Add Code

Learning Query and Image Similarities With Ranking Canonical Correlation Analysis

no code implementations • ICCV 2015 • Ting Yao, Tao Mei, Chong-Wah Ngo

One of the fundamental problems in image search is to learn the ranking functions, i. e., similarity between the query and image.

Image Retrieval

Paper
Add Code

Semi-Supervised Domain Adaptation With Subspace Learning for Visual Recognition

no code implementations • CVPR 2015 • Ting Yao, Yingwei Pan, Chong-Wah Ngo, Houqiang Li, Tao Mei

In many real-world applications, we are often facing the problem of cross domain learning, i. e., to borrow the labeled data or transfer the already learnt knowledge from a source domain to a target domain.

Domain Adaptation Object Recognition +1

Paper
Add Code

You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images

no code implementations • CVPR 2016 • Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei

The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos.

Action Recognition Event Detection +1

Paper
Add Code

Highlight Detection With Pairwise Deep Ranking for First-Person Video Summarization

no code implementations • CVPR 2016 • Ting Yao, Tao Mei, Yong Rui

The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos.

Highlight Detection Video Summarization

Paper
Add Code

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

no code implementations • CVPR 2016 • Jun Xu, Tao Mei, Ting Yao, Yong Rui

In this paper we present MSR-VTT (standing for "ABC-Video to Text") which is a new large-scale video benchmark for video understanding, especially the emerging task of translating video to text.

Image Captioning Sentence +2

Paper
Add Code

Pointing Novel Objects in Image Captioning

no code implementations • CVPR 2019 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

Image captioning has received significant attention with remarkable improvements in recent advances.

Image Captioning Object +2

Paper
Add Code

Transferrable Prototypical Networks for Unsupervised Domain Adaptation

no code implementations • CVPR 2019 • Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, Tao Mei

Specifically, we present Transferrable Prototypical Networks (TPN) for adaptation such that the prototypes for each class in source and target domains are close in the embedding space and the score distributions predicted by prototypes separately on source and target data are similar.

Pseudo Label Unsupervised Domain Adaptation

Paper
Add Code

Learning Spatio-Temporal Representation with Local and Global Diffusion

no code implementations • CVPR 2019 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xinmei Tian, Tao Mei

Diffusions effectively interact two aspects of information, i. e., localized and holistic, for more powerful way of representation learning.

Ranked #8 on Action Recognition on UCF101

Action Classification Action Detection +5

Paper
Add Code

Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019

no code implementations • 14 Jun 2019 • Zhaofan Qiu, Dong Li, Yehao Li, Qi Cai, Yingwei Pan, Ting Yao

This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.

Action Recognition Dense Captioning +2

Paper
Add Code

vireoJD-MM at Activity Detection in Extended Videos

no code implementations • 20 Jun 2019 • Fuchen Long, Qi Cai, Zhaofan Qiu, Zhijian Hou, Yingwei Pan, Ting Yao, Chong-Wah Ngo

This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019.

Action Detection Action Localization +1

Paper
Add Code

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

no code implementations • 1 Aug 2019 • Jing Wang, Yingwei Pan, Ting Yao, Jinhui Tang, Tao Mei

A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure.

Ranked #4 on Image Paragraph Captioning on Image Paragraph Captioning

Descriptive Image Paragraph Captioning +2

Paper
Add Code

Customizable Architecture Search for Semantic Segmentation

no code implementations • CVPR 2019 • Yiheng Zhang, Zhaofan Qiu, Jingen Liu, Ting Yao, Dong Liu, Tao Mei

As a result, our CAS is able to search an optimized architecture with customized constraints.

Image Segmentation Segmentation +1

Paper
Add Code

Mocycle-GAN: Unpaired Video-to-Video Translation

no code implementations • 26 Aug 2019 • Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, Tao Mei

Unsupervised image-to-image translation is the task of translating an image from one domain to another in the absence of any paired training examples and tends to be more applicable to practical applications.

Motion Estimation Translation +1

Paper
Add Code

Hierarchy Parsing for Image Captioning

no code implementations • ICCV 2019 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

It is always well believed that parsing an image into constituent visual patterns would be helpful for understanding and representing an image.

Image Captioning

Paper
Add Code

Deep Metric Learning with Density Adaptivity

no code implementations • 9 Sep 2019 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei

The problem of distance metric learning is mostly considered from the perspective of learning an embedding space, where the distances between pairs of examples are in correspondence with a similarity metric.

Metric Learning

Paper
Add Code

Scheduled Differentiable Architecture Search for Visual Recognition

no code implementations • 23 Sep 2019 • Zhaofan Qiu, Ting Yao, Yiheng Zhang, Yongdong Zhang, Tao Mei

Moreover, we enlarge the search space of SDAS particularly for video recognition by devising several unique operations to encode spatio-temporal dynamics and demonstrate the impact in affecting the architecture search of SDAS.

Video Recognition

Paper
Add Code

Vision and Language: from Visual Perception to Content Creation

no code implementations • 26 Dec 2019 • Tao Mei, Wei zhang, Ting Yao

The real-world deployment or services of vision and language are elaborated as well.

Question Answering valid +3

Paper
Add Code

Long Short-Term Relation Networks for Video Action Detection

no code implementations • 31 Mar 2020 • Dong Li, Ting Yao, Zhaofan Qiu, Houqiang Li, Tao Mei

It has been well recognized that modeling human-object or object-object relations would be helpful for detection task.

Action Detection Object +2

Paper
Add Code

Transferring and Regularizing Prediction for Semantic Segmentation

no code implementations • CVPR 2020 • Yiheng Zhang, Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Dong Liu, Tao Mei

In the view of extremely expensive expert labeling, recent research has shown that the models trained on photo-realistic synthetic data (e. g., computer games) with computer-generated annotations can be adapted to real images.

Ranked #17 on Domain Adaptation on SYNTHIA-to-Cityscapes

Domain Adaptation Segmentation +1

Paper
Add Code

Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation

no code implementations • CVPR 2020 • Yingwei Pan, Ting Yao, Yehao Li, Chong-Wah Ngo, Tao Mei

A clustering branch is capitalized on to ensure that the learnt representation preserves such underlying structure by matching the estimated assignment distribution over clusters to the inherent cluster distribution for each target sample.

Clustering Unsupervised Domain Adaptation

Paper
Add Code

Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training

no code implementations • 5 Jul 2020 • Yingwei Pan, Yehao Li, Jianjie Luo, Jun Xu, Ting Yao, Tao Mei

In this work, we present Auto-captions on GIF, which is a new large-scale pre-training dataset for generic video understanding.

Question Answering Sentence +3

Paper
Add Code

Pre-training for Video Captioning Challenge 2020 Summary

no code implementations • 27 Jul 2020 • Yingwei Pan, Jun Xu, Yehao Li, Ting Yao, Tao Mei

The Pre-training for Video Captioning Challenge 2020 Summary: results and challenge participants' technical reports.

Video Captioning

Paper
Add Code

Three-dimensional cell culture model for hepatocytes opens a new avenue of real world research on liver

no code implementations • 19 Nov 2019 • Ting Yao, Yi Zhang, Mengjiao Lv, Guoqing Zang, Soon Seng Ng, Xiaohua Chen

3-demensional (3D) culture model is a valuable in vitro tool to study liver biology, metabolism, organogenesis, tissue morphology, drug discovery and cell-based assays.

Cultural Vocal Bursts Intensity Prediction Drug Discovery

Paper
Add Code

A Low Rank Promoting Prior for Unsupervised Contrastive Learning

no code implementations • 5 Aug 2021 • Yu Wang, Jingyang Lin, Qi Cai, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei

In this paper, we construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning, referred to as LORAC.

Contrastive Learning Image Classification +5

Paper
Add Code

A Style and Semantic Memory Mechanism for Domain Generalization

no code implementations • ICCV 2021 • Yang Chen, Yu Wang, Yingwei Pan, Ting Yao, Xinmei Tian, Tao Mei

Correspondingly, we also propose a novel "jury" mechanism, which is particularly effective in learning useful semantic feature commonalities among domains.

Ranked #37 on Domain Generalization on PACS

Domain Generalization

Paper
Add Code

Transferrable Contrastive Learning for Visual Domain Adaptation

no code implementations • 14 Dec 2021 • Yang Chen, Yingwei Pan, Yu Wang, Ting Yao, Xinmei Tian, Tao Mei

From this point, we present a particular paradigm of self-supervised learning tailored for domain adaptation, i. e., Transferrable Contrastive Learning (TCL), which links the SSL and the desired cross-domain transferability congruently.

Contrastive Learning Domain Adaptation +2

Paper
Add Code

CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising

no code implementations • 14 Dec 2021 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei

BERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks.

Cross-Modal Retrieval Denoising +6

Paper
Add Code

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

no code implementations • 27 Dec 2021 • Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei

Automatically synthesizing listening behavior that actively responds to a talking head, is critical to applications such as digital human, virtual agents and social robots.

Talking Head Generation Translation

Paper
Add Code

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training

no code implementations • 11 Jan 2022 • Yehao Li, Jiahao Fan, Yingwei Pan, Ting Yao, Weiyao Lin, Tao Mei

Vision-language pre-training has been an emerging and fast-developing research topic, which transfers multi-modal knowledge from rich-resource pre-training task to limited-resource downstream tasks.

Image Captioning Language Modelling +3

Paper
Add Code

Smart Director: An Event-Driven Directing System for Live Broadcasting

no code implementations • 11 Jan 2022 • Yingwei Pan, Yue Chen, Qian Bao, Ning Zhang, Ting Yao, Jingen Liu, Tao Mei

To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events.

Event Detection Highlight Detection

Paper
Add Code

Representing Videos as Discriminative Sub-graphs for Action Recognition

no code implementations • CVPR 2021 • Dong Li, Zhaofan Qiu, Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition.

Action Recognition Graph Learning +1

Paper
Add Code

Boosting Video Representation Learning with Multi-Faceted Integration

no code implementations • CVPR 2021 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xiao-Ping Zhang, Dong Wu, Tao Mei

Video content is multifaceted, consisting of objects, scenes, interactions or actions.

Action Recognition Representation Learning +1

Paper
Add Code

Condensing a Sequence to One Informative Frame for Video Recognition

no code implementations • ICCV 2021 • Zhaofan Qiu, Ting Yao, Yan Shu, Chong-Wah Ngo, Tao Mei

This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" and then exploits off-the-shelf image recognition system on the synthetic frame.

Motion Estimation valid +1

Paper
Add Code

Visualizing and Understanding Patch Interactions in Vision Transformer

no code implementations • 11 Mar 2022 • Jie Ma, Yalong Bai, Bineng Zhong, Wei zhang, Ting Yao, Tao Mei

Vision Transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions.

Paper
Add Code

MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing

no code implementations • CVPR 2022 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei

By deriving the novel grouped time mixing (GTM) operations, we equip the basic token-mixing MLP with the ability of temporal modeling.

Ranked #21 on Action Recognition on Something-Something V1

3D Architecture Action Classification +2

Paper
Add Code

Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models

no code implementations • 11 Jun 2022 • Han Liu, Bingning Wang, Ting Yao, Haijin Liang, Jianjin Xu, Xiaolin Hu

Large-scale pre-trained language models have achieved great success on natural language generation tasks.

Attribute Text Generation

Paper
Add Code

Bi-Calibration Networks for Weakly-Supervised Video Representation Learning

1 code implementation • 21 Jun 2022 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei

The video-to-text/video-to-query projections over text prototypes/query vocabulary then start the text-to-query or query-to-text calibration to estimate the amendment to query or text.

Representation Learning

Paper
Code

Scale Attention for Learning Deep Face Representation: A Study Against Visual Scale Variation

no code implementations • 19 Sep 2022 • Hailin Shi, Hang Du, Yibo Hu, Jun Wang, Dan Zeng, Ting Yao

Such multi-shot scheme brings inference burden, and the predefined scales inevitably have gap from real data.

Face Recognition

Paper
Add Code

Explaining Cross-Domain Recognition with Interpretable Deep Classifier

no code implementations • 15 Nov 2022 • Yiheng Zhang, Ting Yao, Zhaofan Qiu, Tao Mei

In this paper, we ask the question: how much each sample in source domain contributes to the network's prediction on the samples from target domain.

Unsupervised Domain Adaptation

Paper
Add Code

Modality-Agnostic Debiasing for Single Domain Generalization

no code implementations • CVPR 2023 • Sanqing Qu, Yingwei Pan, Guang Chen, Ting Yao, Changjun Jiang, Tao Mei

We validate the superiority of our MAD in a variety of single-DG scenarios with different modalities, including recognition on 1D texts, 2D images, 3D point clouds, and semantic segmentation on 2D images.

Data Augmentation Domain Generalization +1

Paper
Add Code

Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization

no code implementations • CVPR 2023 • ZiCheng Zhang, Yinglu Liu, Congying Han, Yingwei Pan, Tiande Guo, Ting Yao

Simply coupling NeRF with photorealistic style transfer (PST) will result in cross-view inconsistency and degradation of stylized view syntheses.

Novel View Synthesis Style Transfer

Paper
Add Code

HGNet: Learning Hierarchical Geometry From Points, Edges, and Surfaces

no code implementations • CVPR 2023 • Ting Yao, Yehao Li, Yingwei Pan, Tao Mei

Next, as every two neighbor edges compose a surface, we obtain the edge-level representation of each anchor edge via surface-to-edge aggregation over all neighbor surfaces.

3D Object Classification Semantic Segmentation

Paper
Add Code

Visual-Aware Text-to-Speech

no code implementations • 21 Jun 2023 • Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei

Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction.

Speech Synthesis

Paper
Add Code

Deep Equilibrium Multimodal Fusion

no code implementations • 29 Jun 2023 • Jinhong Ni, Yalong Bai, Wei zhang, Ting Yao, Tao Mei

Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.

Visual Question Answering (VQA)

Paper
Add Code

Interactive Conversational Head Generation

no code implementations • 5 Jul 2023 • Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao

Based on ViCo and ViCo-X, we define three novel tasks targeting the interaction modeling during the face-to-face conversation: 1) responsive listening head generation making listeners respond actively to the speaker with non-verbal signals, 2) expressive talking head generation guiding speakers to be aware of listeners' behaviors, and 3) conversational head generation to integrate the talking/listening ability in one interlocutor.

Sentence Talking Head Generation

Paper
Add Code

Learning and Evaluating Human Preferences for Conversational Head Generation

no code implementations • 20 Jul 2023 • Mohan Zhou, Yalong Bai, Wei zhang, Ting Yao, Tiejun Zhao, Tao Mei

In this paper, we propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions.

Paper
Add Code

Selective Volume Mixup for Video Action Recognition

no code implementations • 18 Sep 2023 • Yi Tan, Zhaofan Qiu, Yanbin Hao, Ting Yao, Xiangnan He, Tao Mei

In this paper, we propose a novel video augmentation strategy named Selective Volume Mixup (SV-Mix) to improve the generalization ability of deep models with limited training videos.

Action Recognition Image Augmentation +1

Paper
Add Code

ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion

no code implementations • ICCV 2023 • Qi Cai, Yingwei Pan, Ting Yao, Chong-Wah Ngo, Tao Mei

Recent progress on multi-modal 3D object detection has featured BEV (Bird-Eye-View) based fusion, which effectively unifies both LiDAR point clouds and camera images in a shared BEV space.

3D Object Detection Depth Estimation +2

Paper
Add Code

Learning Neural Implicit Surfaces with Object-Aware Radiance Fields

no code implementations • ICCV 2023 • Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei

Then, we build the geometric correspondence between 2D planes and 3D meshes by rasterization, and project the estimated object regions into 3D explicit object surfaces by aggregating the object information across multiple views.

3D Object Reconstruction Object

Paper
Add Code

Bidirectional Knowledge Reconfiguration for Lightweight Point Cloud Analysis

no code implementations • 8 Oct 2023 • Peipei Li, Xing Cui, Yibo Hu, Man Zhang, Ting Yao, Tao Mei

Directly employing small models may result in a significant drop in performance since it is difficult for a small model to adequately capture local structure and global shape information simultaneously, which are essential clues for point cloud analysis.

Semantic Segmentation

Paper
Add Code

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

no code implementations • 9 Nov 2023 • Jingwen Chen, Yingwei Pan, Ting Yao, Tao Mei

To achieve this, we present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network enabling more conditions of text prompts and style images.

Style Transfer Text-to-Image Generation

Paper
Add Code

Control3D: Towards Controllable Text-to-3D Generation

no code implementations • 9 Nov 2023 • Yang Chen, Yingwei Pan, Yehao Li, Ting Yao, Tao Mei

In particular, a 2D conditioned diffusion model (ControlNet) is remoulded to guide the learning of 3D scene parameterized as NeRF, encouraging each view of 3D scene aligned with the given text prompt and hand-drawn sketch.

3D Generation Text to 3D

Paper
Add Code

VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

no code implementations • 2 Jan 2024 • Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei

The diffusion model incorporates the reference images as the condition and alignment to strengthen the content consistency of multi-scene videos.

Descriptive Video Generation

Paper
Add Code

HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs

no code implementations • 18 Mar 2024 • Ting Yao, Yehao Li, Yingwei Pan, Tao Mei

Instead, we present a new hybrid backbone with HIgh-Resolution Inputs (namely HIRI-ViT), that upgrades prevalent four-stage ViT to five-stage ViT tailored for high-resolution inputs.

Paper
Add Code

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

no code implementations • 25 Mar 2024 • Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei

Technically, SATeCo freezes all the parameters of the pre-trained UNet and VAE, and only optimizes two deliberately-designed spatial feature adaptation (SFA) and temporal feature alignment (TFA) modules, in the decoder of UNet and VAE.

Denoising Image Super-Resolution +3

Paper
Add Code

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

no code implementations • 25 Mar 2024 • Yang Chen, Yingwei Pan, Haibo Yang, Ting Yao, Tao Mei

In this work, we introduce a novel Visual Prompt-guided text-to-3D diffusion model (VP3D) that explicitly unleashes the visual appearance knowledge in 2D visual prompt to boost text-to-3D generation.

3D Generation Text to 3D +1

Paper
Add Code

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

no code implementations • 25 Mar 2024 • Rui Zhu, Yingwei Pan, Yehao Li, Ting Yao, Zhenglong Sun, Tao Mei, Chang Wen Chen

Despite this progress, mask strategy still suffers from two inherent limitations: (a) training-inference discrepancy and (b) fuzzy relations between mask reconstruction & generative diffusion process, resulting in sub-optimal training of DiT.

Image Generation

Paper
Add Code

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

no code implementations • 25 Mar 2024 • Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, Tao Mei

Next, TRIP executes a residual-like dual-path scheme for noise prediction: 1) a shortcut path that directly takes image noise prior as the reference noise of each frame to amplify the alignment between the first frame and subsequent frames; 2) a residual path that employs 3D-UNet over noised video and static image latent codes to enable inter-frame relational reasoning, thereby easing the learning of the residual noise for each frame.

Image to Video Generation Relational Reasoning +1

Paper
Add Code

Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

no code implementations • 26 Mar 2024 • Yurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, Tao Mei

Diffusion models have recently brought a powerful revolution in image generation.

Denoising Image Generation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.