Search Results for author: Guoli Song

Found 14 papers, 7 papers with code

Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning

1 code implementation • CVPR 2023 • Yu Wang, Pengchong Qiao, Chang Liu, Guoli Song, Xiawu Zheng, Jie Chen

We argue that an overlooked problem of robust SSL is its corrupted information on semantic level, practically limiting the development of the field.

Paper
Code

Position Embedding Needs an Independent Layer Normalization

1 code implementation • 10 Dec 2022 • Runyi Yu, Zhennan Wang, Yinhuai Wang, Kehan Li, Yian Zhao, Jian Zhang, Guoli Song, Jie Chen

By analyzing the input and output of each encoder layer in VTs using reparameterization and visualization, we find that the default PE joining method (simply adding the PE and patch embedding together) operates the same affine transformation to token embedding and PE, which limits the expressiveness of PE and hence constrains the performance of VTs.

Position

Paper
Code

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

4 code implementations • 21 Nov 2022 • Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen

Most video-and-language representation learning approaches employ contrastive learning, e. g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs.

Ranked #2 on Video Retrieval on LSMDC (text-to-video Mean Rank metric)

Contrastive Learning Representation Learning +5

Paper
Code

Fuzzy Positive Learning for Semi-supervised Semantic Segmentation

no code implementations • CVPR 2023 • Pengchong Qiao, Zhidan Wei, Yu Wang, Zhennan Wang, Guoli Song, Fan Xu, Xiangyang Ji, Chang Liu, Jie Chen

Semi-supervised learning (SSL) essentially pursues class boundary exploration with less dependence on human annotations.

Semi-Supervised Semantic Segmentation

Paper
Add Code

ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation

no code implementations • CVPR 2023 • Kehan Li, Zhennan Wang, Zesen Cheng, Runyi Yu, Yian Zhao, Guoli Song, Chang Liu, Li Yuan, Jie Chen

Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e. g., unsupervised semantic segmentation (USS).

Image Segmentation Unsupervised Semantic Segmentation

Paper
Add Code

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering

no code implementations • 21 Sep 2022 • Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen

Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.

Image Captioning Optical Character Recognition (OCR) +2

Paper
Add Code

Locality Guidance for Improving Vision Transformers on Tiny Datasets

1 code implementation • 20 Jul 2022 • Kehan Li, Runyi Yu, Zhennan Wang, Li Yuan, Guoli Song, Jie Chen

Therefore, our locality guidance approach is very simple and efficient, and can serve as a basic performance enhancement method for VTs on tiny datasets.

Paper
Code

$L_2$BN: Enhancing Batch Normalization by Equalizing the $L_2$ Norms of Features

no code implementations • 6 Jul 2022 • Zhennan Wang, Kehan Li, Runyi Yu, Yian Zhao, Pengchong Qiao, Chang Liu, Fan Xu, Xiangyang Ji, Guoli Song, Jie Chen

In this paper, we analyze batch normalization from the perspective of discriminability and find the disadvantages ignored by previous studies: the difference in $l_2$ norms of sample features can hinder batch normalization from obtaining more distinguished inter-class features and more compact intra-class features.

Acoustic Scene Classification Image Classification +1

Paper
Add Code

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

no code implementations • CVPR 2022 • Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Visual appearance is considered to be the most important cue to understand images for cross-modal retrieval, while sometimes the scene text appearing in images can provide valuable information to understand the visual semantics.

Ranked #10 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Contrastive Learning Cross-Modal Retrieval +1

Paper
Add Code

CDNet: Centripetal Direction Network for Nuclear Instance Segmentation

1 code implementation • ICCV 2021 • Hongliang He, Zhongyi Huang, Yao Ding, Guoli Song, Lin Wang, Qian Ren, Pengxu Wei, Zhiqiang Gao, Jie Chen

Specifically, we define the centripetal direction feature as a class of adjacent directions pointing to the nuclear center to represent the spatial relationship between pixels within the nucleus.

Instance Segmentation Segmentation +1

Paper
Code

Learning fragment self-attention embeddings for image-text matching

1 code implementation • ACMMM 2019 • Yiling Wu, Shuhui Wang, Guoli Song, Qingming Huang

In this paper, we propose Self-Attention Embeddings (SAEM) to exploit fragment relations in images or texts by self-attention mechanism, and aggregate fragment information into visual and textual embeddings.

Image-text matching Sentence +1

Paper
Code

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

1 code implementation • 14 Aug 2019 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

Multimodal learning aims to discover the relationship between multiple modalities.

Cross-Modal Retrieval Retrieval

Paper
Code

Multimodal Gaussian Process Latent Variable Models With Harmonization

no code implementations • ICCV 2017 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

We incorporate the harmonization mechanism into the learning process of multimodal GPLVMs.

Cross-Modal Retrieval Retrieval

Paper
Add Code

Similarity Gaussian Process Latent Variable Model for Multi-Modal Data Analysis

no code implementations • ICCV 2015 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

Data from real applications involve multiple modalities representing content with the same semantics and deliver rich information from complementary aspects.

Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.