no code implementations • ECCV 2020 • Tan Yu, Yunfeng Cai, Ping Li
To boost the efficiency in the GPU platform, recent methods rely on Newton-Schulz (NS) iteration to approximate the matrix square-root.
no code implementations • Findings (NAACL) 2022 • Jiaheng Liu, Tan Yu, Hanyu Peng, Mingming Sun, Ping Li
Existing multilingual video corpus moment retrieval (mVCMR) methods are mainly based on a two-stream structure.
no code implementations • EMNLP 2021 • Haoliang Liu, Tan Yu, Ping Li
Through an inflating operation followed by a shrinking operation, both efficiency and accuracy of a late-interaction model are boosted.
1 code implementation • 17 Mar 2024 • Shu Zhao, Xiaohan Zou, Tan Yu, Huijuan Xu
Meanwhile, our RebQ leverages extensive multi-modal knowledge from pre-trained LMMs to reconstruct the data of missing modality.
no code implementations • 25 Nov 2022 • Tan Yu, Ping Li
To bring back the global receptive field, window-based Vision Transformers have devoted a lot of efforts to achieving cross-window communications by developing several sophisticated operations.
2 code implementations • 20 Nov 2022 • Shuo Chen, Tan Yu, Ping Li
Recently, vision architectures based exclusively on multi-layer perceptrons (MLPs) have gained much attention in the computer vision community.
Ranked #1 on 3D Object Recognition on ModelNet40
no code implementations • 19 Oct 2022 • Yue Zhang, Hongliang Fei, Dingcheng Li, Tan Yu, Ping Li
In particular, we focus on few-shot image recognition tasks on pretrained vision-language models (PVLMs) and develop a method of prompting through prototype (PTP), where we define $K$ image prototypes and $K$ prompt prototypes.
no code implementations • 13 Oct 2022 • Tan Yu, Jun Zhi, Yufei Zhang, Jian Li, Hongliang Fei, Ping Li
In this paper, we formulate the APP-installation user embedding learning into a bipartite graph embedding problem.
no code implementations • 23 Sep 2022 • Tan Yu, Zhipeng Jin, Jie Liu, Yi Yang, Hongliang Fei, Ping Li
To overcome the limitations of behavior ID features in modeling new ads, we exploit the visual content in ads to boost the performance of CTR prediction models.
no code implementations • 19 Sep 2022 • Tan Yu, Jie Liu, Yi Yang, Yi Li, Hongliang Fei, Ping Li
How to pair the video ads with the user search is the core task of Baidu video advertising.
1 code implementation • 31 Jan 2022 • Tan Yu, Gangming Zhao, Ping Li, Yizhou Yu
To improve efficiency, recent Vision Transformers adopt local self-attention mechanisms, where self-attention is computed within local windows.
2 code implementations • 25 Oct 2021 • Shuo Chen, Tan Yu, Ping Li
Nevertheless, multi-view CNN models cannot model the communications between patches from different views, limiting its effectiveness in 3D object recognition.
Ranked #2 on 3D Object Recognition on ModelNet40
no code implementations • ICLR 2022 • Tan Yu, Jun Li, Yunfeng Cai, Ping Li
A convolution layer with an orthogonal Jacobian matrix is 1-Lipschitz in the 2-norm, making the output robust to the perturbation in input.
3 code implementations • 2 Aug 2021 • Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li
More recently, using smaller patches with a pyramid structure, Vision Permutator (ViP) and Global Filter Network (GFNet) achieve better performance than S$^2$-MLP.
no code implementations • 28 Jun 2021 • Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li
By introducing the inductive bias from the image processing, convolution neural network (CNN) has achieved excellent performance in numerous computer vision tasks and has been established as \emph{de facto} backbone.
1 code implementation • 14 Jun 2021 • Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li
We discover that the token-mixing MLP is a variant of the depthwise convolution with a global reception field and spatial-specific configuration.
no code implementations • NAACL 2021 • Hongliang Fei, Tan Yu, Ping Li
Recent pretrained vision-language models have achieved impressive performance on cross-modal retrieval tasks in English.
no code implementations • 1 Jan 2021 • Tan Yu, Hongliang Fei, Ping Li
Inspired by the great success of BERT in NLP tasks, many text-vision BERT models emerged recently.
no code implementations • ICCV 2019 • Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan
In TSM, each action instance is modeled as a multi-phase process and phase evolving within an action instance, i. e., the temporal structure, is exploited.
Ranked #12 on Weakly Supervised Action Localization on ActivityNet-1.3 (mAP@0.5 metric)
no code implementations • ECCV 2018 • Tan Yu, Junsong Yuan, Chen Fang, Hailin Jin
Product quantization has been widely used in fast image retrieval due to its effectiveness of coding high-dimensional visual features.
no code implementations • CVPR 2018 • Tan Yu, Jingjing Meng, Junsong Yuan
View-based methods have achieved considerable success in $3$D object recognition tasks.
no code implementations • ICCV 2017 • Tan Yu, Zhenzhen Wang, Junsong Yuan
Most of current visual search systems focus on image-to-image (point-to-point) search such as image and object retrieval.
no code implementations • CVPR 2017 • Tan Yu, Yuwei Wu, Junsong Yuan
This paper tackles the problem of efficient and effective object instance search in videos.