no code implementations • 9 Nov 2023 • Yang Chen, Yingwei Pan, Yehao Li, Ting Yao, Tao Mei
In particular, a 2D conditioned diffusion model (ControlNet) is remoulded to guide the learning of 3D scene parameterized as NeRF, encouraging each view of 3D scene aligned with the given text prompt and hand-drawn sketch.
no code implementations • CVPR 2023 • Ting Yao, Yehao Li, Yingwei Pan, Tao Mei
Next, as every two neighbor edges compose a surface, we obtain the edge-level representation of each anchor edge via surface-to-edge aggregation over all neighbor surfaces.
1 code implementation • CVPR 2023 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Jianlin Feng, Hongyang Chao, Tao Mei
The rich semantics are further regarded as semantic prior to trigger the learning of Diffusion Transformer, which produces the output sentence in a diffusion process.
1 code implementation • 15 Nov 2022 • Zhaofan Qiu, Yehao Li, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei
In this paper, we propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net.
2 code implementations • 11 Jul 2022 • Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, Tao Mei
Motivated by the wavelet theory, we construct a new Wavelet Vision Transformer (\textbf{Wave-ViT}) that formulates the invertible down-sampling with wavelet transforms and self-attention learning in a unified way.
Ranked #199 on
Image Classification
on ImageNet
1 code implementation • 11 Jul 2022 • Ting Yao, Yehao Li, Yingwei Pan, Yu Wang, Xiao-Ping Zhang, Tao Mei
Dual-ViT is henceforth able to reduce the computational complexity without compromising much accuracy.
1 code implementation • CVPR 2022 • Yehao Li, Yingwei Pan, Ting Yao, Tao Mei
In this paper, we propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net), that novelly unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture.
1 code implementation • 13 Jun 2022 • Yingwei Pan, Yehao Li, Yiheng Zhang, Qi Cai, Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories.
no code implementations • 11 Jan 2022 • Yehao Li, Jiahao Fan, Yingwei Pan, Ting Yao, Weiyao Lin, Tao Mei
Vision-language pre-training has been an emerging and fast-developing research topic, which transfers multi-modal knowledge from rich-resource pre-training task to limited-resource downstream tasks.
no code implementations • 14 Dec 2021 • Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei
BERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks.
2 code implementations • 18 Aug 2021 • Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, Tao Mei
Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion.
7 code implementations • 26 Jul 2021 • Yehao Li, Ting Yao, Yingwei Pan, Tao Mei
Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.
Ranked #269 on
Image Classification
on ImageNet
1 code implementation • 27 Jan 2021 • Yehao Li, Yingwei Pan, Ting Yao, Jingwen Chen, Tao Mei
Despite having impressive vision-language (VL) pretraining with BERT-based encoder for VL understanding, the pretraining of a universal encoder-decoder for both VL understanding and generation remains challenging.
no code implementations • 27 Jul 2020 • Yingwei Pan, Jun Xu, Yehao Li, Ting Yao, Tao Mei
The Pre-training for Video Captioning Challenge 2020 Summary: results and challenge participants' technical reports.
no code implementations • 5 Jul 2020 • Yingwei Pan, Yehao Li, Jianjie Luo, Jun Xu, Ting Yao, Tao Mei
In this work, we present Auto-captions on GIF, which is a new large-scale pre-training dataset for generic video understanding.
no code implementations • CVPR 2020 • Yingwei Pan, Ting Yao, Yehao Li, Chong-Wah Ngo, Tao Mei
A clustering branch is capitalized on to ensure that the learnt representation preserves such underlying structure by matching the estimated assignment distribution over clusters to the inherent cluster distribution for each target sample.
2 code implementations • CVPR 2020 • Yingwei Pan, Ting Yao, Yehao Li, Tao Mei
Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2$^{nd}$ order interactions across multi-modal inputs.
Ranked #21 on
Image Captioning
on COCO Captions
2 code implementations • 8 Oct 2019 • Yingwei Pan, Yehao Li, Qi Cai, Yang Chen, Ting Yao
Semi-Supervised Domain Adaptation: For this task, we adopt a standard self-learning framework to construct a classifier based on the labeled source and target data, and generate the pseudo labels for unlabeled target data.
no code implementations • ICCV 2019 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
It is always well believed that parsing an image into constituent visual patterns would be helpful for understanding and representing an image.
no code implementations • 9 Sep 2019 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei
The problem of distance metric learning is mostly considered from the perspective of learning an embedding space, where the distances between pairs of examples are in correspondence with a similarity metric.
no code implementations • 14 Jun 2019 • Zhaofan Qiu, Dong Li, Yehao Li, Qi Cai, Yingwei Pan, Ting Yao
This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.
1 code implementation • 3 May 2019 • Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei
Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations.
no code implementations • CVPR 2019 • Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, Tao Mei
Specifically, we present Transferrable Prototypical Networks (TPN) for adaptation such that the prototypes for each class in source and target domains are close in the embedding space and the score distributions predicted by prototypes separately on source and target data are similar.
no code implementations • CVPR 2019 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei
Image captioning has received significant attention with remarkable improvements in recent advances.
no code implementations • ECCV 2018 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections.
no code implementations • CVPR 2018 • Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei
A valid question is how to temporally localize and then describe events, which is known as "dense video captioning."
no code implementations • CVPR 2017 • Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
Image captioning often requires a large set of training image-sentence pairs.
no code implementations • ICCV 2017 • Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei
Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing.