Search Results for author: Tan Yu

Found 23 papers, 6 papers with code

Toward Faster and Simpler Matrix Normalization via Rank-1 Update

no code implementations • ECCV 2020 • Tan Yu, Yunfeng Cai, Ping Li

To boost the efficiency in the GPU platform, recent methods rely on Newton-Schulz (NS) iteration to approximate the matrix square-root.

Paper
Add Code

Cross-Lingual Cross-Modal Consolidation for Effective Multilingual Video Corpus Moment Retrieval

no code implementations • Findings (NAACL) 2022 • Jiaheng Liu, Tan Yu, Hanyu Peng, Mingming Sun, Ping Li

Existing multilingual video corpus moment retrieval (mVCMR) methods are mainly based on a two-stream structure.

Moment Retrieval Retrieval +2

Paper
Add Code

Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval

no code implementations • EMNLP 2021 • Haoliang Liu, Tan Yu, Ping Li

Through an inflating operation followed by a shrinking operation, both efficiency and accuracy of a late-interaction model are boosted.

Cross-Modal Retrieval Image Retrieval +1

Paper
Add Code

Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration

1 code implementation • 17 Mar 2024 • Shu Zhao, Xiaohan Zou, Tan Yu, Huijuan Xu

Meanwhile, our RebQ leverages extensive multi-modal knowledge from pre-trained LMMs to reconstruct the data of missing modality.

Continual Learning

Paper
Code

Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations

no code implementations • 25 Nov 2022 • Tan Yu, Ping Li

To bring back the global receptive field, window-based Vision Transformers have devoted a lot of efforts to achieving cross-window communications by developing several sophisticated operations.

object-detection Object Detection +1

Paper
Add Code

R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition

2 code implementations • 20 Nov 2022 • Shuo Chen, Tan Yu, Ping Li

Recently, vision architectures based exclusively on multi-layer perceptrons (MLPs) have gained much attention in the computer vision community.

Ranked #1 on 3D Object Recognition on ModelNet40

3D Object Recognition Image Classification +1

Paper
Code

Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models

no code implementations • 19 Oct 2022 • Yue Zhang, Hongliang Fei, Dingcheng Li, Tan Yu, Ping Li

In particular, we focus on few-shot image recognition tasks on pretrained vision-language models (PVLMs) and develop a method of prompting through prototype (PTP), where we define $K$ image prototypes and $K$ prompt prototypes.

Few-Shot Learning

Paper
Add Code

Decomposing User-APP Graph into Subgraphs for Effective APP and User Embedding Learning

no code implementations • 13 Oct 2022 • Tan Yu, Jun Zhi, Yufei Zhang, Jian Li, Hongliang Fei, Ping Li

In this paper, we formulate the APP-installation user embedding learning into a bipartite graph embedding problem.

Graph Embedding Graph Learning

Paper
Add Code

Boost CTR Prediction for New Advertisements via Modeling Visual Content

no code implementations • 23 Sep 2022 • Tan Yu, Zhipeng Jin, Jie Liu, Yi Yang, Hongliang Fei, Ping Li

To overcome the limitations of behavior ID features in modeling new ads, we exploit the visual content in ads to boost the performance of CTR prediction models.

Click-Through Rate Prediction Quantization

Paper
Add Code

Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

no code implementations • 19 Sep 2022 • Tan Yu, Jie Liu, Yi Yang, Yi Li, Hongliang Fei, Ping Li

How to pair the video ads with the user search is the core task of Baidu video advertising.

Image Retrieval Retrieval +1

Paper
Add Code

BOAT: Bilateral Local Attention Vision Transformer

1 code implementation • 31 Jan 2022 • Tan Yu, Gangming Zhao, Ping Li, Yizhou Yu

To improve efficiency, recent Vision Transformers adopt local self-attention mechanisms, where self-attention is computed within local windows.

Paper
Code

MVT: Multi-view Vision Transformer for 3D Object Recognition

2 code implementations • 25 Oct 2021 • Shuo Chen, Tan Yu, Ping Li

Nevertheless, multi-view CNN models cannot model the communications between patches from different views, limiting its effectiveness in 3D object recognition.

Ranked #2 on 3D Object Recognition on ModelNet40

3D Object Recognition Inductive Bias +1

Paper
Code

Constructing Orthogonal Convolutions in an Explicit Manner

no code implementations • ICLR 2022 • Tan Yu, Jun Li, Yunfeng Cai, Ping Li

A convolution layer with an orthogonal Jacobian matrix is 1-Lipschitz in the 2-norm, making the output robust to the perturbation in input.

Paper
Add Code

S$^2$-MLPv2: Improved Spatial-Shift MLP Architecture for Vision

3 code implementations • 2 Aug 2021 • Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li

More recently, using smaller patches with a pyramid structure, Vision Permutator (ViP) and Global Filter Network (GFNet) achieve better performance than S$^2$-MLP.

Inductive Bias

160

Paper
Code

Rethinking Token-Mixing MLP for MLP-based Vision Backbone

no code implementations • 28 Jun 2021 • Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li

By introducing the inductive bias from the image processing, convolution neural network (CNN) has achieved excellent performance in numerous computer vision tasks and has been established as \emph{de facto} backbone.

Inductive Bias

Paper
Add Code

S$^2$-MLP: Spatial-Shift MLP Architecture for Vision

1 code implementation • 14 Jun 2021 • Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li

We discover that the token-mixing MLP is a variant of the depthwise convolution with a global reception field and spatial-specific configuration.

Paper
Code

Cross-lingual Cross-modal Pretraining for Multimodal Retrieval

no code implementations • NAACL 2021 • Hongliang Fei, Tan Yu, Ping Li

Recent pretrained vision-language models have achieved impressive performance on cross-modal retrieval tasks in English.

Cross-Modal Retrieval Machine Translation +2

Paper
Add Code

Cross-Probe BERT for Efficient and Effective Cross-Modal Search

no code implementations • 1 Jan 2021 • Tan Yu, Hongliang Fei, Ping Li

Inspired by the great success of BERT in NLP tasks, many text-vision BERT models emerged recently.

Image Retrieval Retrieval

Paper
Add Code

Temporal Structure Mining for Weakly Supervised Action Detection

no code implementations • ICCV 2019 • Tan Yu, Zhou Ren, Yuncheng Li, Enxu Yan, Ning Xu, Junsong Yuan

In TSM, each action instance is modeled as a multi-phase process and phase evolving within an action instance, i. e., the temporal structure, is exploited.

Ranked #12 on Weakly Supervised Action Localization on ActivityNet-1.3 (mAP@0.5 metric)

Action Detection Weakly Supervised Action Localization

Paper
Add Code

Product Quantization Network for Fast Image Retrieval

no code implementations • ECCV 2018 • Tan Yu, Junsong Yuan, Chen Fang, Hailin Jin

Product quantization has been widely used in fast image retrieval due to its effectiveness of coding high-dimensional visual features.

Image Retrieval Quantization +1

Paper
Add Code

Multi-View Harmonized Bilinear Network for 3D Object Recognition

no code implementations • CVPR 2018 • Tan Yu, Jingjing Meng, Junsong Yuan

View-based methods have achieved considerable success in $3$D object recognition tasks.

3D Object Recognition Object

Paper
Add Code

Compressive Quantization for Fast Object Instance Search in Videos

no code implementations • ICCV 2017 • Tan Yu, Zhenzhen Wang, Junsong Yuan

Most of current visual search systems focus on image-to-image (point-to-point) search such as image and object retrieval.

Instance Search Object +3

Paper
Add Code

HOPE: Hierarchical Object Prototype Encoding for Efficient Object Instance Search in Videos

no code implementations • CVPR 2017 • Tan Yu, Yuwei Wu, Junsong Yuan

This paper tackles the problem of efficient and effective object instance search in videos.

Instance Search Object

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.