Search Results for author: Siyu Zhang

Found 22 papers, 13 papers with code

SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

no code implementations • 10 Apr 2024 • Diankun Zhang, Guoan Wang, Runwen Zhu, Jianbo Zhao, Xiwu Chen, Siyu Zhang, Jiahao Gong, Qibin Zhou, Wenyuan Zhang, Ningzi Wang, Feiyang Tan, Hangning Zhou, Ziyao Xu, Haotian Yao, Chi Zhang, Xiaojun Liu, Xiaoguang Di, Bin Li

End-to-End paradigms use a unified framework to implement multi-tasks in an autonomous driving system.

Autonomous Driving motion prediction

Paper
Add Code

Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations

1 code implementation • 28 Feb 2024 • Shen Cai, Zhanhao Wu, Lingxi Guo, Jiachun Wang, Siyu Zhang, Junchi Yan, Shuhan Shen

Under the minimal $4$-point configuration, the first and the last similarity transformations in SKS are computed by two anchor points on target and source planes, respectively.

Computational Efficiency

Paper
Code

A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network

no code implementations • 17 Dec 2023 • Siyu Zhang, Wendong Mao, Huihong Shi, Zhongfeng Wang

Video compression is widely used in digital television, surveillance systems, and virtual reality.

Video Compression

Paper
Add Code

KNVQA: A Benchmark for evaluation knowledge-based VQA

no code implementations • 21 Nov 2023 • Sirui Cheng, Siyu Zhang, Jiayi Wu, Muchen Lan

Within the multimodal field, large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems.

Hallucination Visual Question Answering (VQA)

Paper
Add Code

Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation

no code implementations • 20 Oct 2023 • Siyu Zhang, Yeming Chen, Sirui Cheng, Yaoru Sun, Jun Yang, Lizhi Bai

It parses the entire image as a fine-to-coarse hierarchical structure of constituent visual patterns, and captures multiscale features by progressively merging adjacent superpixels as graph nodes.

Self-Supervised Learning Superpixels +1

Paper
Add Code

Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout

1 code implementation • 24 Sep 2023 • Haoran Wang, Zeshen Tang, Leya Yang, Yaoru Sun, Fang Wang, Siyu Zhang, Yeming Chen

Here, we propose a goal-conditioned HRL framework named Guided Cooperation via Model-based Rollout (GCMR), aiming to bridge inter-layer information synchronization and cooperation by exploiting forward dynamics.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Code

Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning

no code implementations • 18 Aug 2023 • Yeming Chen, Siyu Zhang, Yaoru Sun, Weijian Liang, Haoran Wang

In this work, we propose an efficient computation framework for multimodal alignment by introducing a novel visual semantic module to further improve the performance of the VL tasks.

Computational Efficiency Contrastive Learning +2

Paper
Add Code

LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain of Teach Prompts

1 code implementation • 11 Aug 2023 • Shangqing Tu, Zheyuan Zhang, Jifan Yu, Chunyang Li, Siyu Zhang, Zijun Yao, Lei Hou, Juanzi Li

However, few MOOC platforms are providing human or virtual teaching assistants to support learning for massive online students due to the complexity of real-world online education scenarios and the lack of training data.

Language Modelling Question Answering +1

Paper
Code

LOIS: Looking Out of Instance Semantics for Visual Question Answering

no code implementations • 26 Jul 2023 • Siyu Zhang, Yeming Chen, Yaoru Sun, Fang Wang, Haibo Shi, Haoran Wang

Visual question answering (VQA) has been intensively studied as a multimodal task that requires effort in bridging vision and language to infer answers correctly.

Question Answering Visual Question Answering +1

Paper
Add Code

You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking

1 code implementation • 18 Apr 2023 • Xiyang Wang, Chunyun Fu, JiaWei He, Mingguang Huang, Ting Meng, Siyu Zhang, Hangning Zhou, Ziyao Xu, Chi Zhang

In the classical tracking-by-detection (TBD) paradigm, detection and tracking are separately and sequentially conducted, and data association must be properly performed to achieve satisfactory tracking performance.

3D Multi-Object Tracking Object +3

Paper
Code

PCAE: A Framework of Plug-in Conditional Auto-Encoder for Controllable Text Generation

1 code implementation • 7 Oct 2022 • Haoqin Tu, Zhongliang Yang, Jinshuai Yang, Siyu Zhang, Yongfeng Huang

Visualization of the local latent prior well confirms the primary devotion in hidden space of the proposed model.

Text Generation

Paper
Code

OnePose: One-Shot Object Pose Estimation without CAD Models

1 code implementation • CVPR 2022 • Jiaming Sun, ZiHao Wang, Siyu Zhang, Xingyi He, Hongcheng Zhao, Guofeng Zhang, Xiaowei Zhou

We propose a new method named OnePose for object pose estimation.

6D Pose Estimation Graph Attention +2

897

Paper
Code

An Efficient End-to-End 3D Voxel Reconstruction based on Neural Architecture Search

1 code implementation • 27 Feb 2022 • Yongdong Huang, Yuanzhan Li, Xulong Cao, Siyu Zhang, Shen Cai, Ting Lu, Jie Wang, Yuqi Liu

However, many previous works employ neural networks with fixed architecture and size to represent different 3D objects, which lead to excessive network parameters for simple objects and limited reconstruction accuracy for complex objects.

Binary Classification Neural Architecture Search +1

Paper
Code

High-fidelity 3D Model Compression based on Key Spheres

1 code implementation • 19 Jan 2022 • Yuanzhan Li, Yuqi Liu, Yujie Lu, Siyu Zhang, Shen Cai, Yanting Zhang

Compared to previous works, our method achieves the high-fidelity and high-compression 3D object coding and reconstruction.

Model Compression Object +1

Paper
Code

Provably Secure Generative Linguistic Steganography

1 code implementation • Findings (ACL) 2021 • Siyu Zhang, Zhongliang Yang, Jinshuai Yang, Yongfeng Huang

Generative linguistic steganography mainly utilized language models and applied steganographic sampling (stegosampling) to generate high-security steganographic text (stegotext).

Language Modelling Linguistic steganography

Paper
Code

SN-Graph: a Minimalist 3D Object Representation for Classification

1 code implementation • 31 May 2021 • Siyu Zhang, Hui Cao, Yuqi Liu, Shen Cai, Yanting Zhang, Yuanzhan Li, Xiaoyu Chi

Using deep learning techniques to process 3D objects has achieved many successes.

Classification Object

Paper
Code

You Don't Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking

no code implementations • ICCV 2021 • Jiaming Sun, Yiming Xie, Siyu Zhang, Linghao Chen, Guofeng Zhang, Hujun Bao, Xiaowei Zhou

In this work, we propose a novel system for integrated 3D object detection and tracking, which uses a dynamic object occupancy map and previous object states as spatial-temporal memory to assist object detection in future frames.

3D Object Detection Object +2

Paper
Add Code

A Fast Hybrid Cascade Network for Voxel-based 3D Object Classification

1 code implementation • 9 Nov 2020 • Ji Luo, Hui Cao, Jie Wang, Siyu Zhang, Shen Cai

Voxel-based 3D object classification has been thoroughly studied in recent years.

3D Object Classification Classification +1

Paper
Code

Unsupervised Deep Representation Learning and Few-Shot Classification of PolSAR Images

no code implementations • 27 Jun 2020 • Lamei Zhang, Siyu Zhang, Bin Zou, Hongwei Dong

To handle this problem, in this paper, learning transferrable representations from unlabeled PolSAR data through convolutional architectures is explored for the first time.

Contrastive Learning General Classification +3

Paper
Add Code

Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation

1 code implementation • CVPR 2020 • Jiaming Sun, Linghao Chen, Yiming Xie, Siyu Zhang, Qinhong Jiang, Xiaowei Zhou, Hujun Bao

In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images.

Ranked #3 on 3D Object Detection From Stereo Images on KITTI Cyclists Moderate

3D Object Detection From Stereo Images Disparity Estimation +2

211

Paper
Code

InSphereNet: a Concise Representation and Classification Method for 3D Object

1 code implementation • 25 Dec 2019 • Hui Cao, Haikuan Du, Siyu Zhang, Shen Cai

Unlike previous methods that use points, voxels, or multi-view images as inputs of deep neural network (DNN), the proposed method constructs a class of more representative features named infilling spheres from signed distance field (SDF).

3D Object Classification Classification +1

Paper
Code

Automatic Design of CNNs via Differentiable Neural Architecture Search for PolSAR Image Classification

no code implementations • 16 Nov 2019 • Hongwei Dong, Siyu Zhang, Bin Zou, Lamei Zhang

By DAS, the weights parameters and architecture parameters (corresponds to the hyperparameters but not the topologies) can be optimized by stochastic gradient descent method during the training.

Feature Engineering General Classification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.