Search Results for author: Guoli Song

Found 14 papers, 7 papers with code

Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning

1 code implementation CVPR 2023 Yu Wang, Pengchong Qiao, Chang Liu, Guoli Song, Xiawu Zheng, Jie Chen

We argue that an overlooked problem of robust SSL is its corrupted information on semantic level, practically limiting the development of the field.

Position Embedding Needs an Independent Layer Normalization

1 code implementation10 Dec 2022 Runyi Yu, Zhennan Wang, Yinhuai Wang, Kehan Li, Yian Zhao, Jian Zhang, Guoli Song, Jie Chen

By analyzing the input and output of each encoder layer in VTs using reparameterization and visualization, we find that the default PE joining method (simply adding the PE and patch embedding together) operates the same affine transformation to token embedding and PE, which limits the expressiveness of PE and hence constrains the performance of VTs.

Position

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

4 code implementations21 Nov 2022 Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen

Most video-and-language representation learning approaches employ contrastive learning, e. g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs.

Ranked #2 on Video Retrieval on LSMDC (text-to-video Mean Rank metric)

Contrastive Learning Representation Learning +5

ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation

no code implementations CVPR 2023 Kehan Li, Zhennan Wang, Zesen Cheng, Runyi Yu, Yian Zhao, Guoli Song, Chang Liu, Li Yuan, Jie Chen

Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e. g., unsupervised semantic segmentation (USS).

Image Segmentation Unsupervised Semantic Segmentation

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering

no code implementations21 Sep 2022 Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen

Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.

Image Captioning Optical Character Recognition (OCR) +2

Locality Guidance for Improving Vision Transformers on Tiny Datasets

1 code implementation20 Jul 2022 Kehan Li, Runyi Yu, Zhennan Wang, Li Yuan, Guoli Song, Jie Chen

Therefore, our locality guidance approach is very simple and efficient, and can serve as a basic performance enhancement method for VTs on tiny datasets.

$L_2$BN: Enhancing Batch Normalization by Equalizing the $L_2$ Norms of Features

no code implementations6 Jul 2022 Zhennan Wang, Kehan Li, Runyi Yu, Yian Zhao, Pengchong Qiao, Chang Liu, Fan Xu, Xiangyang Ji, Guoli Song, Jie Chen

In this paper, we analyze batch normalization from the perspective of discriminability and find the disadvantages ignored by previous studies: the difference in $l_2$ norms of sample features can hinder batch normalization from obtaining more distinguished inter-class features and more compact intra-class features.

Acoustic Scene Classification Image Classification +1

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

no code implementations CVPR 2022 Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Visual appearance is considered to be the most important cue to understand images for cross-modal retrieval, while sometimes the scene text appearing in images can provide valuable information to understand the visual semantics.

Ranked #10 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Contrastive Learning Cross-Modal Retrieval +1

CDNet: Centripetal Direction Network for Nuclear Instance Segmentation

1 code implementation ICCV 2021 Hongliang He, Zhongyi Huang, Yao Ding, Guoli Song, Lin Wang, Qian Ren, Pengxu Wei, Zhiqiang Gao, Jie Chen

Specifically, we define the centripetal direction feature as a class of adjacent directions pointing to the nuclear center to represent the spatial relationship between pixels within the nucleus.

Instance Segmentation Segmentation +1

Learning fragment self-attention embeddings for image-text matching

1 code implementation ACMMM 2019 Yiling Wu, Shuhui Wang, Guoli Song, Qingming Huang

In this paper, we propose Self-Attention Embeddings (SAEM) to exploit fragment relations in images or texts by self-attention mechanism, and aggregate fragment information into visual and textual embeddings.

Image-text matching Sentence +1

Similarity Gaussian Process Latent Variable Model for Multi-Modal Data Analysis

no code implementations ICCV 2015 Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

Data from real applications involve multiple modalities representing content with the same semantics and deliver rich information from complementary aspects.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.