Search Results for author: Songyang Zhang

Found 57 papers, 30 papers with code

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

no code implementations31 May 2023 Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo

In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.

Counterfactual Inference Question Answering +1

Radiomap Inpainting for Restricted Areas based on Propagation Priority and Depth Map

no code implementations24 May 2023 Songyang Zhang, Tianhang Yu, Brian Choi, Feng Ouyang, Zhi Ding

Providing rich and useful information regarding spectrum activities and propagation channels, radiomaps characterize the detailed distribution of power spectral density (PSD) and are important tools for network planning in modern wireless systems.

TG-VQA: Ternary Game of Video Question Answering

no code implementations17 May 2023 Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen

Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them.

Contrastive Learning Question Answering +2

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations12 Apr 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

1 code implementation4 Mar 2023 YuAn Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin

Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.

Self-Supervised Learning

Temporal Segment Transformer for Action Segmentation

no code implementations25 Feb 2023 Zhichao Liu, Leshan Wang, Desen Zhou, Jian Wang, Songyang Zhang, Yang Bai, Errui Ding, Rui Fan

To deal with these issues, we propose an attention based approach which we call \textit{temporal segment transformer}, for joint segment relation modeling and denoising.

Action Segmentation Denoising

Dynamic Grained Encoder for Vision Transformers

1 code implementation NeurIPS 2021 Lin Song, Songyang Zhang, Songtao Liu, Zeming Li, Xuming He, Hongbin Sun, Jian Sun, Nanning Zheng

Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.

Image Classification Language Modelling +2

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

no code implementations CVPR 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

Learning a Grammar Inducer from Massive Uncurated Instructional Videos

1 code implementation22 Oct 2022 Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo

While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.

Language Acquisition Video Alignment

Make-A-Video: Text-to-Video Generation without Text-Video Data

1 code implementation29 Sep 2022 Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Image Generation Super-Resolution +2

Exemplar-Based Radio Map Reconstruction of Missing Areas Using Propagation Priority

no code implementations10 Sep 2022 Songyang Zhang, Tianhang Yu, Jonathan Tivald, Brian Choi, Feng Ouyang, Zhi Ding

Radio map describes network coverage and is a practically important tool for network planning in modern wireless systems.

Learning Semantic Correspondence with Sparse Annotations

no code implementations15 Aug 2022 Shuaiyi Huang, Luyu Yang, Bo He, Songyang Zhang, Xuming He, Abhinav Shrivastava

In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations.

Denoising Semantic correspondence

Rethinking the Evaluation of Unbiased Scene Graph Generation

no code implementations3 Aug 2022 Xingchen Li, Long Chen, Jian Shao, Shaoning Xiao, Songyang Zhang, Jun Xiao

Current Scene Graph Generation (SGG) methods tend to predict frequent predicate categories and fail to recognize rare ones due to the severe imbalanced distribution of predicates.

Graph Generation Unbiased Scene Graph Generation

Action Quality Assessment with Temporal Parsing Transformer

1 code implementation19 Jul 2022 Yang Bai, Desen Zhou, Songyang Zhang, Jian Wang, Errui Ding, Yu Guan, Yang Long, Jingdong Wang

Action Quality Assessment(AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences.

Action Quality Assessment Action Understanding +1

The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation

1 code implementation CVPR 2022 Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, Jun Xiao

Then, in Pos-NSD, we use a clustering-based algorithm to divide all positive samples into multiple sets, and treat the samples in the noisiest set as noisy positive samples.

Graph Generation Out-of-Distribution Detection +2

Budget-aware Few-shot Learning via Graph Convolutional Network

no code implementations7 Jan 2022 Shipeng Yan, Songyang Zhang, Xuming He

In this work, we introduce a new budget-aware few-shot learning problem that not only aims to learn novel object categories, but also needs to select informative examples to annotate in order to achieve data efficiency.

Few-Shot Learning Informativeness

SGTR: End-to-end Scene Graph Generation with Transformer

1 code implementation CVPR 2022 Rongjie Li, Songyang Zhang, Xuming He

Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property.

graph construction Graph Generation +1

Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation

no code implementations3 Sep 2021 Jiahui Li, Kun Kuang, Lin Li, Long Chen, Songyang Zhang, Jian Shao, Jun Xiao

Deep neural networks have demonstrated remarkable performance in many data-driven and prediction-oriented applications, and sometimes even perform better than humans.

Medical Diagnosis

Signal Processing over Multilayer Graphs: Theoretical Foundations and Practical Applications

1 code implementation31 Aug 2021 Songyang Zhang, Qinwen Deng, Zhi Ding

To generalize traditional graph signal processing (GSP) over multilayer graphs for analyzing multi-level signal features and their interactions, this work proposes a tensor-based framework of multilayer graph signal processing (M-GSP).

Image Processing via Multilayer Graph Spectra

no code implementations31 Aug 2021 Songyang Zhang, Qinwen Deng, Zhi Ding

Graph signal processing (GSP) has become an important tool in image processing because of its ability to reveal underlying data structures.

Edge Detection Hyperspectral Image Segmentation +3

An EM Framework for Online Incremental Learning of Semantic Segmentation

1 code implementation8 Aug 2021 Shipeng Yan, Jiale Zhou, Jiangwei Xie, Songyang Zhang, Xuming He

Incremental learning of semantic segmentation has emerged as a promising strategy for visual scene interpretation in the open- world setting.

Incremental Learning Semantic Segmentation

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

1 code implementation ICCV 2021 Zhengyuan Yang, Songyang Zhang, LiWei Wang, Jiebo Luo

3D visual grounding aims at grounding a natural language description about a 3D scene, usually represented in the form of 3D point clouds, to the targeted object region.

Representation Learning Visual Grounding

Learning Implicit Temporal Alignment for Few-shot Video Classification

1 code implementation11 May 2021 Songyang Zhang, Jiale Zhou, Xuming He

Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications.

Action Recognition In Videos Classification +2

Video-aided Unsupervised Grammar Induction

1 code implementation NAACL 2021 Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, Jiebo Luo

We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video.

Optical Character Recognition (OCR)

Boundary Proposal Network for Two-Stage Natural Language Video Localization

no code implementations15 Mar 2021 Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, Jun Xiao

State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e. g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment.

Vocal Bursts Valence Prediction

An Efficient Hypergraph Approach to Robust Point Cloud Resampling

no code implementations11 Mar 2021 Qinwen Deng, Songyang Zhang, Zhi Ding

Efficient processing and feature extraction of largescale point clouds are important in related computer vision and cyber-physical systems.

Point Cloud Resampling Through Hypergraph Signal Processing

no code implementations12 Feb 2021 Qinwen Deng, Songyang Zhang, Zhi Ding

By directly estimating hypergraph spectrum based on hypergraph stationary processing, we design a spectral kernel-based filter to capture high-dimensional interactions among point signal nodes and to better preserve object surface outlines.

Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language

1 code implementation4 Dec 2020 Songyang Zhang, Houwen Peng, Jianlong Fu, Yijuan Lu, Jiebo Luo

It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.

Content-based Analysis of the Cultural Differences between TikTok and Douyin

no code implementations3 Nov 2020 Li Sun, Haoqi Zhang, Songyang Zhang, Jiebo Luo

Short-form video social media shifts away from the traditional media paradigm by telling the audience a dynamic story to attract their attention.

object-detection Object Detection

Transformer with Bidirectional Decoder for Speech Recognition

no code implementations11 Aug 2020 Xi Chen, Songyang Zhang, Dandan song, Peng Ouyang, Shouyi Yin

To demonstrate our proposed speech transformer with a bidirectional decoder(STBD), we conduct extensive experiments on the AISHELL-1 dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

From Spectrum Wavelet to Vertex Propagation: Graph Convolutional Networks Based on Taylor Approximation

no code implementations1 Jul 2020 Songyang Zhang, Han Zhang, Shuguang Cui, Zhi Ding

Graph convolutional networks (GCN) have been recently utilized to extract the underlying structures of datasets with some labeled data and high-dimensional features.

Node Classification

Global Image Sentiment Transfer

no code implementations22 Jun 2020 Jie An, Tianlang Chen, Songyang Zhang, Jiebo Luo

This work proposes a novel framework consisting of a reference image retrieval step and a global sentiment transfer step to transfer sentiments of images according to a given sentiment tag.

Image Retrieval Retrieval +3

Point Cloud Segmentation based on Hypergraph Spectral Clustering

no code implementations21 Jan 2020 Songyang Zhang, Shuguang Cui, Zhi Ding

Hypergraph spectral analysis has emerged as an effective tool processing complex data structures in data analysis.

Point Cloud Segmentation

Hypergraph Spectral Analysis and Processing in 3D Point Cloud

no code implementations8 Jan 2020 Songyang Zhang, Shuguang Cui, Zhi Ding

Along with increasingly popular virtual reality applications, the three-dimensional (3D) point cloud has become a fundamental data structure to characterize 3D objects and surroundings.


Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language

3 code implementations8 Dec 2019 Songyang Zhang, Houwen Peng, Jianlong Fu, Jiebo Luo

We address the problem of retrieving a specific moment from an untrimmed video by a query sentence.

Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization

2 code implementations8 Dec 2019 Songyang Zhang, Houwen Peng, Le Yang, Jianlong Fu, Jiebo Luo

In this report, we introduce the Winner method for HACS Temporal Action Localization Challenge 2019.

Temporal Action Localization

An Evaluation of BBR and its variants

1 code implementation9 Sep 2019 Songyang Zhang

The congestion control algorithm bring such importance that it avoids the network link into severe congestion and guarantees network normal operation.

Networking and Internet Architecture

Dynamic Context Correspondence Network for Semantic Alignment

1 code implementation ICCV 2019 Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, Xuming He

We instantiate our strategy by designing an end-to-end learnable deep network, named as Dynamic Context Correspondence Network (DCCNet).

Semantic correspondence Weakly-supervised Learning

Exploiting Temporal Relationships in Video Moment Localization with Natural Language

1 code implementation11 Aug 2019 Songyang Zhang, Jinsong Su, Jiebo Luo

We address the problem of video moment localization with natural language, i. e. localizing a video segment described by a natural language sentence.

Introducing Hypergraph Signal Processing: Theoretical Foundation and Practical Applications

no code implementations22 Jul 2019 Songyang Zhang, Zhi Ding, Shuguang Cui

Signal processing over graphs has recently attracted significant attentions for dealing with structured data.

LatentGNN: Learning Efficient Non-local Relations for Visual Recognition

1 code implementation28 May 2019 Songyang Zhang, Shipeng Yan, Xuming He

A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation.

Congestion Control for RTP Media: a Comparison on Simulated Environment

2 code implementations2 Sep 2018 Songyang Zhang

To develop low latency congestion control algorithm for real time taffic has been gained attention recently.

Networking and Internet Architecture

Generalization Tower Network: A Novel Deep Neural Network Architecture for Multi-Task Learning

1 code implementation27 Oct 2017 Yuhang Song, Main Xu, Songyang Zhang, Liangyu Huo

However, the conventional deep neural network architecture is limited in learning representations for multi-task RL (MT-RL), as multiple tasks can refer to different kinds of representations.

Atari Games Multi-Task Learning +1

Predicting Salient Face in Multiple-Face Videos

1 code implementation CVPR 2017 Yufan Liu, Songyang Zhang, Mai Xu, Xuming He

On the other hand, we find that the attention of different subjects consistently focuses on a single face in each frame of videos involving multiple faces.

Saliency Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.