Search Results for author: Haiyang Sun

Found 33 papers, 10 papers with code

SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction

no code implementations16 Apr 2025 Xia Wang, Haiyang Sun, Tiantian Cao, Yueying Sun, Min Feng

Moir\'e patterns, resulting from aliasing between object light signals and camera sampling frequencies, often degrade image quality during capture.

Decoder Image Generation +1

CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving

no code implementations28 Mar 2025 Yishen Ji, Ziyue Zhu, Zhenxin Zhu, Kaixin Xiong, Ming Lu, Zhiqi Li, Lijun Zhou, Haiyang Sun, Bing Wang, Tong Lu

Recent progress in driving video generation has shown significant potential for enhancing self-driving systems by providing scalable and controllable training data.

3D Generation Autonomous Driving +1

BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation

no code implementations5 Mar 2025 Gangwei Xu, Haotong Lin, Zhaoxing Zhang, Hongcheng Luo, Haiyang Sun, Xin Yang

Event cameras deliver visual information characterized by a high dynamic range and high temporal resolution, offering significant advantages in estimating optical flow for complex lighting conditions and fast-moving objects.

Event-based Optical Flow Optical Flow Estimation

Improved VMD Based Remote Heartbeat Estimation Utilizing 60GHz mmWave Radar

no code implementations16 Feb 2025 Boyuan Gu, Yanhui Yang, Siyu You, Haiyang Sun, Jiahui Sun, Shisheng Guo

This study introduces an improved VMD based signal decomposition methodology for non-contact heartbeat estimation using millimeterwave (mmWave) radar.

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching

no code implementations16 Feb 2025 Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang, Shiwan Zhao, Haiyang Sun, Yanqing Liu, Haoqin Sun, Jiaming Zhou, Yan Lu, Yong Qin

To advance continuous-valued token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching.

Language Modeling Language Modelling +1

Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models

1 code implementation13 Feb 2025 Yiheng Liu, Xiaohui Gao, Haiyang Sun, Bao Ge, Tianming Liu, Junwei Han, Xintao Hu

We use methods similar to those in the field of functional neuroimaging analysis to locate and identify functional networks in LLM.

CMamba: Learned Image Compression with State Space Models

no code implementations7 Feb 2025 Zhuojie Wu, Heming Du, Shuyun Wang, Ming Lu, Haiyang Sun, Yandong Guo, Xin Yu

In this paper, we propose a hybrid Convolution and State Space Models (SSMs) based image compression framework, termed \textit{CMamba}, to achieve superior rate-distortion performance with low computational complexity.

Image Compression State Space Models

LinBridge: A Learnable Framework for Interpreting Nonlinear Neural Encoding Models

no code implementations26 Oct 2024 Xiaohui Gao, Yue Cheng, Peiyang Li, Yijie Niu, Yifan Ren, Yiheng Liu, Haiyang Sun, Zhuoyi Li, Weiwei Xing, Xintao Hu

LinBridge posits that the nonlinear mapping between ANN representations and neural responses can be factorized into a linear inherent component that approximates the complex nonlinear relationship, and a mapping bias that captures sample-selective nonlinearity.

Self-Supervised Learning

Brain-like Functional Organization within Large Language Models

no code implementations25 Oct 2024 Haiyang Sun, Lin Zhao, Zihao Wu, Xiaohui Gao, Yutao Hu, Mengfei Zuo, Wei zhang, Junwei Han, Tianming Liu, Xintao Hu

In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs), the foundational organizational structure of the human brain.

Diverse Sign Language Translation

1 code implementation25 Oct 2024 Xin Shen, Lei Shen, Shaozu Yuan, Heming Du, Haiyang Sun, Xin Yu

In this work, we introduce a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos.

Sign Language Translation Translation +1

DiVE: DiT-based Video Generation with Enhanced Control

no code implementations3 Sep 2024 Junpeng Jiang, Gangyi Hong, Lijun Zhou, Enhui Ma, Hengtong Hu, Xia Zhou, Jie Xiang, Fan Liu, Kaicheng Yu, Haiyang Sun, Kun Zhan, Peng Jia, Miao Zhang

Generating high-fidelity, temporally consistent videos in autonomous driving scenarios faces a significant challenge, e. g. problematic maneuvers in corner cases.

Autonomous Driving Video Generation

3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

no code implementations7 Jun 2024 Xiaobiao Du, Haiyang Sun, Shuyun Wang, Zhuojie Wu, Hongwei Sheng, Jiaying Ying, Ming Lu, Tianqing Zhu, Kun Zhan, Xin Yu

(1) \textbf{High-Volume}: 2, 500 cars are meticulously scanned by 3D scanners, obtaining car images and point clouds with real-world dimensions; (2) \textbf{High-Quality}: Each car is captured in an average of 200 dense, high-resolution 360-degree RGB-D views, enabling high-fidelity 3D reconstruction; (3) \textbf{High-Diversity}: The dataset contains various cars from over 100 brands, collected under three distinct lighting conditions, including reflective, standard, and dark.

3D Reconstruction

Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

no code implementations3 Jun 2024 Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng Yu

Instead of randomly generating new data, we further design a sampling policy to let Delphi generate new data that are similar to those failure cases to improve the sample efficiency.

Autonomous Driving Video Generation

TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

1 code implementation28 Mar 2024 Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao

However, the exploration of 3D dense captioning in outdoor scenes is hindered by two major challenges: 1) the domain gap between indoor and outdoor scenes, such as dynamics and sparse visual inputs, makes it difficult to directly adapt existing indoor methods; 2) the lack of data with comprehensive box-caption pair annotations specifically tailored for outdoor scenes.

3D dense captioning Dense Captioning

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting

2 code implementations2 Jan 2024 Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng

Recent methods extend NeRF by incorporating tracked vehicle poses to animate vehicles, enabling photo-realistic view synthesis of dynamic urban street scenes.

Autonomous Driving NeRF +1

SVFAP: Self-supervised Video Facial Affect Perceiver

1 code implementation31 Dec 2023 Licai Sun, Zheng Lian, Kexin Wang, Yu He, Mingyu Xu, Haiyang Sun, Bin Liu, JianHua Tao

Video-based facial affect analysis has recently attracted increasing attention owing to its critical role in human-computer interaction.

Dynamic Facial Expression Recognition Emotion Recognition +2

OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

no code implementations12 Dec 2023 Hu Zhang, Jianhua Xu, Tao Tang, Haiyang Sun, Xin Yu, Zi Huang, Kaicheng Yu

OpenSight utilizes 2D-3D geometric priors for the initial discernment and localization of generic objects, followed by a more specific semantic interpretation of the detected objects.

cross-modal alignment object-detection +1

GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition

1 code implementation7 Dec 2023 Zheng Lian, Licai Sun, Haiyang Sun, Kang Chen, Zhuofan Wen, Hao Gu, Bin Liu, JianHua Tao

To bridge this gap, we present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks: visual sentiment analysis, tweet sentiment analysis, micro-expression recognition, facial emotion recognition, dynamic facial emotion recognition, and multimodal emotion recognition.

Facial Emotion Recognition Micro Expression Recognition +3

MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition

no code implementations12 Jun 2023 Haiyang Sun, FuLin Zhang, Yingying Gao, Zheng Lian, Shilei Zhang, Junlan Feng

Considering comprehensiveness, we partition speech knowledge into Textual-related Emotional Content (TEC) and Speech-related Emotional Content (SEC), capturing cues from both semantic and acoustic perspectives, and we design a new architecture search space to fully leverage them.

Quantization Speech Emotion Recognition

Fully Automated End-to-End Fake Audio Detection

no code implementations20 Aug 2022 Chenglong Wang, Jiangyan Yi, JianHua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure.

Two-Aspect Information Fusion Model For ABAW4 Multi-task Challenge

no code implementations23 Jul 2022 Haiyang Sun, Zheng Lian, Bin Liu, JianHua Tao, Licai Sun, Cong Cai

In this paper, we propose the solution to the Multi-Task Learning (MTL) Challenge of the 4th Affective Behavior Analysis in-the-wild (ABAW) competition.

Multi-Task Learning Vocal Bursts Valence Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.