Search Results for author: Songyang Zhang

Found 77 papers, 42 papers with code

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations12 Apr 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

1 code implementation4 Mar 2023 YuAn Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin

Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.

Self-Supervised Learning

Make-A-Video: Text-to-Video Generation without Text-Video Data

2 code implementations29 Sep 2022 Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Ranked #3 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)

Image Generation Super-Resolution +2

Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization

2 code implementations8 Dec 2019 Songyang Zhang, Houwen Peng, Le Yang, Jianlong Fu, Jiebo Luo

In this report, we introduce the Winner method for HACS Temporal Action Localization Challenge 2019.

Temporal Action Localization

Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language

3 code implementations8 Dec 2019 Songyang Zhang, Houwen Peng, Jianlong Fu, Jiebo Luo

We address the problem of retrieving a specific moment from an untrimmed video by a query sentence.

Sentence

Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language

1 code implementation4 Dec 2020 Songyang Zhang, Houwen Peng, Jianlong Fu, Yijuan Lu, Jiebo Luo

It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

1 code implementation16 Jan 2024 Huanjun Kong, Songyang Zhang, Kai Chen

In this work, we present HuixiangDou, a technical assistant powered by Large Language Models (LLM).

In-Context Learning

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

1 code implementation9 Feb 2024 Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin

We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math.

Data Augmentation GSM8K +3

LawBench: Benchmarking Legal Knowledge of Large Language Models

1 code implementation28 Sep 2023 Zhiwei Fei, Xiaoyu Shen, Dawei Zhu, Fengzhe Zhou, Zhuo Han, Songyang Zhang, Kai Chen, Zongwen Shen, Jidong Ge

We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain.

Benchmarking Memorization +1

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues

1 code implementation20 Oct 2023 Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, Songyang Zhang, Dahua Lin, Kai Chen

In contrast, other LLMs struggle to generate multi-turn dialogues of satisfactory quality due to poor instruction-following capability, tendency to generate lengthy utterances, or limited general capability.

Instruction Following

An Evaluation of BBR and its variants

1 code implementation9 Sep 2019 Songyang Zhang

The congestion control algorithm bring such importance that it avoids the network link into severe congestion and guarantees network normal operation.

Networking and Internet Architecture

LatentGNN: Learning Efficient Non-local Relations for Visual Recognition

1 code implementation28 May 2019 Songyang Zhang, Shipeng Yan, Xuming He

A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation.

SGTR: End-to-end Scene Graph Generation with Transformer

1 code implementation CVPR 2022 Rongjie Li, Songyang Zhang, Xuming He

Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property.

graph construction Graph Generation +1

SGTR+: End-to-end Scene Graph Generation with Transformer

1 code implementation23 Jan 2024 Rongjie Li, Songyang Zhang, Xuming He

Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on our entity-aware structure, enabling us to generate the scene graph in an end-to-end manner.

graph construction Graph Generation +1

Congestion Control for RTP Media: a Comparison on Simulated Environment

2 code implementations2 Sep 2018 Songyang Zhang

To develop low latency congestion control algorithm for real time taffic has been gained attention recently.

Networking and Internet Architecture

Video-aided Unsupervised Grammar Induction

1 code implementation NAACL 2021 Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, Jiebo Luo

We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video.

Optical Character Recognition (OCR)

Learning a Grammar Inducer from Massive Uncurated Instructional Videos

1 code implementation22 Oct 2022 Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo

While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.

Language Acquisition Video Alignment

Dynamic Grained Encoder for Vision Transformers

1 code implementation NeurIPS 2021 Lin Song, Songyang Zhang, Songtao Liu, Zeming Li, Xuming He, Hongbin Sun, Jian Sun, Nanning Zheng

Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.

Image Classification Language Modelling +2

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

1 code implementation ICCV 2021 Zhengyuan Yang, Songyang Zhang, LiWei Wang, Jiebo Luo

3D visual grounding aims at grounding a natural language description about a 3D scene, usually represented in the form of 3D point clouds, to the targeted object region.

Object Representation Learning +1

The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation

1 code implementation CVPR 2022 Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, Jun Xiao

Then, in Pos-NSD, we use a clustering-based algorithm to divide all positive samples into multiple sets, and treat the samples in the noisiest set as noisy positive samples.

Graph Generation Out-of-Distribution Detection +2

Dynamic Context Correspondence Network for Semantic Alignment

1 code implementation ICCV 2019 Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, Shipeng Yan, Xuming He

We instantiate our strategy by designing an end-to-end learnable deep network, named as Dynamic Context Correspondence Network (DCCNet).

Semantic correspondence Weakly-supervised Learning

Action Quality Assessment with Temporal Parsing Transformer

1 code implementation19 Jul 2022 Yang Bai, Desen Zhou, Songyang Zhang, Jian Wang, Errui Ding, Yu Guan, Yang Long, Jingdong Wang

Action Quality Assessment(AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences.

Action Quality Assessment Action Understanding +1

Learning Implicit Temporal Alignment for Few-shot Video Classification

1 code implementation11 May 2021 Songyang Zhang, Jiale Zhou, Xuming He

Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications.

Action Recognition In Videos Classification +2

Learning Semantic Correspondence with Sparse Annotations

1 code implementation15 Aug 2022 Shuaiyi Huang, Luyu Yang, Bo He, Songyang Zhang, Xuming He, Abhinav Shrivastava

In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations.

Denoising Semantic correspondence

Exploiting Temporal Relationships in Video Moment Localization with Natural Language

1 code implementation11 Aug 2019 Songyang Zhang, Jinsong Su, Jiebo Luo

We address the problem of video moment localization with natural language, i. e. localizing a video segment described by a natural language sentence.

Sentence

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

1 code implementation21 Mar 2024 Jiaxing Sun, Weiquan Huang, Jiang Wu, Chenya Gu, Wei Li, Songyang Zhang, Hang Yan, Conghui He

We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense.

Benchmarking Memorization

An EM Framework for Online Incremental Learning of Semantic Segmentation

1 code implementation8 Aug 2021 Shipeng Yan, Jiale Zhou, Jiangwei Xie, Songyang Zhang, Xuming He

Incremental learning of semantic segmentation has emerged as a promising strategy for visual scene interpretation in the open- world setting.

Incremental Learning Missing Labels +2

Predicting Salient Face in Multiple-Face Videos

1 code implementation CVPR 2017 Yufan Liu, Songyang Zhang, Mai Xu, Xuming He

On the other hand, we find that the attention of different subjects consistently focuses on a single face in each frame of videos involving multiple faces.

Saliency Prediction

Signal Processing over Multilayer Graphs: Theoretical Foundations and Practical Applications

1 code implementation31 Aug 2021 Songyang Zhang, Qinwen Deng, Zhi Ding

To generalize traditional graph signal processing (GSP) over multilayer graphs for analyzing multi-level signal features and their interactions, this work proposes a tensor-based framework of multilayer graph signal processing (M-GSP).

Generalization Tower Network: A Novel Deep Neural Network Architecture for Multi-Task Learning

1 code implementation27 Oct 2017 Yuhang Song, Main Xu, Songyang Zhang, Liangyu Huo

However, the conventional deep neural network architecture is limited in learning representations for multi-task RL (MT-RL), as multiple tasks can refer to different kinds of representations.

Atari Games Multi-Task Learning +1

Hypergraph Spectral Analysis and Processing in 3D Point Cloud

no code implementations8 Jan 2020 Songyang Zhang, Shuguang Cui, Zhi Ding

Along with increasingly popular virtual reality applications, the three-dimensional (3D) point cloud has become a fundamental data structure to characterize 3D objects and surroundings.

Denoising

Global Image Sentiment Transfer

no code implementations22 Jun 2020 Jie An, Tianlang Chen, Songyang Zhang, Jiebo Luo

This work proposes a novel framework consisting of a reference image retrieval step and a global sentiment transfer step to transfer sentiments of images according to a given sentiment tag.

Image Retrieval Retrieval +3

From Spectrum Wavelet to Vertex Propagation: Graph Convolutional Networks Based on Taylor Approximation

no code implementations1 Jul 2020 Songyang Zhang, Han Zhang, Shuguang Cui, Zhi Ding

Graph convolutional networks (GCN) have been recently utilized to extract the underlying structures of datasets with some labeled data and high-dimensional features.

Node Classification

Transformer with Bidirectional Decoder for Speech Recognition

no code implementations11 Aug 2020 Xi Chen, Songyang Zhang, Dandan song, Peng Ouyang, Shouyi Yin

To demonstrate our proposed speech transformer with a bidirectional decoder(STBD), we conduct extensive experiments on the AISHELL-1 dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Content-based Analysis of the Cultural Differences between TikTok and Douyin

no code implementations3 Nov 2020 Li Sun, Haoqi Zhang, Songyang Zhang, Jiebo Luo

Short-form video social media shifts away from the traditional media paradigm by telling the audience a dynamic story to attract their attention.

Object object-detection +1

An Efficient Hypergraph Approach to Robust Point Cloud Resampling

no code implementations11 Mar 2021 Qinwen Deng, Songyang Zhang, Zhi Ding

Efficient processing and feature extraction of largescale point clouds are important in related computer vision and cyber-physical systems.

Boundary Proposal Network for Two-Stage Natural Language Video Localization

no code implementations15 Mar 2021 Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, Jun Xiao

State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e. g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment.

Vocal Bursts Valence Prediction

Point Cloud Resampling Through Hypergraph Signal Processing

no code implementations12 Feb 2021 Qinwen Deng, Songyang Zhang, Zhi Ding

By directly estimating hypergraph spectrum based on hypergraph stationary processing, we design a spectral kernel-based filter to capture high-dimensional interactions among point signal nodes and to better preserve object surface outlines.

Object

Point Cloud Segmentation based on Hypergraph Spectral Clustering

no code implementations21 Jan 2020 Songyang Zhang, Shuguang Cui, Zhi Ding

Hypergraph spectral analysis has emerged as an effective tool processing complex data structures in data analysis.

Clustering Point Cloud Segmentation +1

Introducing Hypergraph Signal Processing: Theoretical Foundation and Practical Applications

no code implementations22 Jul 2019 Songyang Zhang, Zhi Ding, Shuguang Cui

Signal processing over graphs has recently attracted significant attentions for dealing with structured data.

Image Processing via Multilayer Graph Spectra

no code implementations31 Aug 2021 Songyang Zhang, Qinwen Deng, Zhi Ding

Graph signal processing (GSP) has become an important tool in image processing because of its ability to reveal underlying data structures.

Edge Detection Hyperspectral Image Segmentation +3

Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation

no code implementations3 Sep 2021 Jiahui Li, Kun Kuang, Lin Li, Long Chen, Songyang Zhang, Jian Shao, Jun Xiao

Deep neural networks have demonstrated remarkable performance in many data-driven and prediction-oriented applications, and sometimes even perform better than humans.

Medical Diagnosis

Budget-aware Few-shot Learning via Graph Convolutional Network

no code implementations7 Jan 2022 Shipeng Yan, Songyang Zhang, Xuming He

In this work, we introduce a new budget-aware few-shot learning problem that not only aims to learn novel object categories, but also needs to select informative examples to annotate in order to achieve data efficiency.

Few-Shot Learning Informativeness

Rethinking the Evaluation of Unbiased Scene Graph Generation

no code implementations3 Aug 2022 Xingchen Li, Long Chen, Jian Shao, Shaoning Xiao, Songyang Zhang, Jun Xiao

Current Scene Graph Generation (SGG) methods tend to predict frequent predicate categories and fail to recognize rare ones due to the severe imbalanced distribution of predicates.

Graph Generation Unbiased Scene Graph Generation

Exemplar-Based Radio Map Reconstruction of Missing Areas Using Propagation Priority

no code implementations10 Sep 2022 Songyang Zhang, Tianhang Yu, Jonathan Tivald, Brian Choi, Feng Ouyang, Zhi Ding

Radio map describes network coverage and is a practically important tool for network planning in modern wireless systems.

Temporal Segment Transformer for Action Segmentation

no code implementations25 Feb 2023 Zhichao Liu, Leshan Wang, Desen Zhou, Jian Wang, Songyang Zhang, Yang Bai, Errui Ding, Rui Fan

To deal with these issues, we propose an attention based approach which we call \textit{temporal segment transformer}, for joint segment relation modeling and denoising.

Action Segmentation Denoising +1

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

no code implementations CVPR 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

TG-VQA: Ternary Game of Video Question Answering

no code implementations17 May 2023 Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen

Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them.

Contrastive Learning Question Answering +2

Radiomap Inpainting for Restricted Areas based on Propagation Priority and Depth Map

no code implementations24 May 2023 Songyang Zhang, Tianhang Yu, Brian Choi, Feng Ouyang, Zhi Ding

Providing rich and useful information regarding spectrum activities and propagation channels, radiomaps characterize the detailed distribution of power spectral density (PSD) and are important tools for network planning in modern wireless systems.

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

no code implementations31 May 2023 Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo

In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.

counterfactual Counterfactual Inference +2

UFed-GAN: A Secure Federated Learning Framework with Constrained Computation and Unlabeled Data

no code implementations10 Aug 2023 Achintha Wijesinghe, Songyang Zhang, Siyu Qi, Zhi Ding

To satisfy the broad applications and insatiable hunger for deploying low latency multimedia data classification and data privacy in a cloud-based setting, federated learning (FL) has emerged as an important learning paradigm.

Federated Learning Generative Adversarial Network

PFL-GAN: When Client Heterogeneity Meets Generative Models in Personalized Federated Learning

no code implementations23 Aug 2023 Achintha Wijesinghe, Songyang Zhang, Zhi Ding

Recent advances of generative learning models are accompanied by the growing interest in federated learning (FL) based on generative adversarial network (GAN) models.

Generative Adversarial Network Personalized Federated Learning

Reinforcement Learning for Robust Header Compression under Model Uncertainty

no code implementations23 Sep 2023 Shusen Jing, Songyang Zhang, Zhi Ding

Robust header compression (ROHC), critically positioned between the network and the MAC layers, plays an important role in modern wireless communication systems for improving data efficiency.

reinforcement-learning Reinforcement Learning (RL)

Fake Alignment: Are LLMs Really Aligned Well?

no code implementations10 Nov 2023 Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun Ma, Yu-Gang Jiang, Yu Qiao, Yingchun Wang

To address this, we introduce the Fake alIgNment Evaluation (FINE) framework and two novel metrics--Consistency Score (CS) and Consistent Safety Score (CSS), which jointly assess two complementary forms of evaluation to quantify fake alignment and obtain corrected performance estimates.

Multiple-choice

Diff-GO: Diffusion Goal-Oriented Communications to Achieve Ultra-High Spectrum Efficiency

no code implementations13 Nov 2023 Achintha Wijesinghe, Songyang Zhang, Suchinthaka Wanninayaka, Weiwei Wang, Zhi Ding

This work presents an ultra-efficient communication design by utilizing generative AI-based on diffusion models as a specific example of the general goal-oriented communication framework.

RadioGAT: A Joint Model-based and Data-driven Framework for Multi-band Radiomap Reconstruction via Graph Attention Networks

no code implementations25 Mar 2024 Xiaojie Li, Songyang Zhang, Hang Li, Xiaoyang Li, Lexi Xu, Haigao Xu, Hui Mei, Guangxu Zhu, Nan Qi, Ming Xiao

Multi-band radiomap reconstruction (MB-RMR) is a key component in wireless communications for tasks such as spectrum management and network planning.

Graph Attention

Cannot find the paper you are looking for? You can Submit a new open access paper.