no code implementations • 16 Apr 2025 • Xia Wang, Haiyang Sun, Tiantian Cao, Yueying Sun, Min Feng
Moir\'e patterns, resulting from aliasing between object light signals and camera sampling frequencies, often degrade image quality during capture.
no code implementations • 14 Apr 2025 • Yifan Yang, Shujie Liu, Jinyu Li, Yuxuan Hu, Haibin Wu, Hui Wang, Jianwei Yu, Lingwei Meng, Haiyang Sun, Yanqing Liu, Yan Lu, Kai Yu, Xie Chen
Building on PAR, we propose PALLE, a two-stage TTS system that leverages PAR for initial generation followed by NAR refinement.
no code implementations • 28 Mar 2025 • Yishen Ji, Ziyue Zhu, Zhenxin Zhu, Kaixin Xiong, Ming Lu, Zhiqi Li, Lijun Zhou, Haiyang Sun, Bing Wang, Tong Lu
Recent progress in driving video generation has shown significant potential for enhancing self-driving systems by providing scalable and controllable training data.
no code implementations • 5 Mar 2025 • Gangwei Xu, Haotong Lin, Zhaoxing Zhang, Hongcheng Luo, Haiyang Sun, Xin Yang
Event cameras deliver visual information characterized by a high dynamic range and high temporal resolution, offering significant advantages in estimating optical flow for complex lighting conditions and fast-moving objects.
no code implementations • 16 Feb 2025 • Boyuan Gu, Yanhui Yang, Siyu You, Haiyang Sun, Jiahui Sun, Shisheng Guo
This study introduces an improved VMD based signal decomposition methodology for non-contact heartbeat estimation using millimeterwave (mmWave) radar.
no code implementations • 16 Feb 2025 • Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang, Shiwan Zhao, Haiyang Sun, Yanqing Liu, Haoqin Sun, Jiaming Zhou, Yan Lu, Yong Qin
To advance continuous-valued token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching.
1 code implementation • 13 Feb 2025 • Yiheng Liu, Xiaohui Gao, Haiyang Sun, Bao Ge, Tianming Liu, Junwei Han, Xintao Hu
We use methods similar to those in the field of functional neuroimaging analysis to locate and identify functional networks in LLM.
no code implementations • 7 Feb 2025 • Zhuojie Wu, Heming Du, Shuyun Wang, Ming Lu, Haiyang Sun, Yandong Guo, Xin Yu
In this paper, we propose a hybrid Convolution and State Space Models (SSMs) based image compression framework, termed \textit{CMamba}, to achieve superior rate-distortion performance with low computational complexity.
no code implementations • 20 Dec 2024 • Yifan Yang, Ziyang Ma, Shujie Liu, Jinyu Li, Hui Wang, Lingwei Meng, Haiyang Sun, Yuzhe Liang, Ruiyang Xu, Yuxuan Hu, Yan Lu, Rui Zhao, Xie Chen
This paper introduces Interleaved Speech-Text Language Model (IST-LM) for streaming zero-shot Text-to-Speech (TTS).
no code implementations • 30 Nov 2024 • Tianyang Zhong, Zhenyuan Yang, Zhengliang Liu, Ruidong Zhang, Yiheng Liu, Haiyang Sun, Yi Pan, Yiwei Li, Yifan Zhou, Hanqi Jiang, JunHao Chen, Tianming Liu
Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity.
no code implementations • 26 Oct 2024 • Xiaohui Gao, Yue Cheng, Peiyang Li, Yijie Niu, Yifan Ren, Yiheng Liu, Haiyang Sun, Zhuoyi Li, Weiwei Xing, Xintao Hu
LinBridge posits that the nonlinear mapping between ANN representations and neural responses can be factorized into a linear inherent component that approximates the complex nonlinear relationship, and a mapping bias that captures sample-selective nonlinearity.
no code implementations • 25 Oct 2024 • Haiyang Sun, Lin Zhao, Zihao Wu, Xiaohui Gao, Yutao Hu, Mengfei Zuo, Wei zhang, Junwei Han, Tianming Liu, Xintao Hu
In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs), the foundational organizational structure of the human brain.
1 code implementation • 25 Oct 2024 • Xin Shen, Lei Shen, Shaozu Yuan, Heming Du, Haiyang Sun, Xin Yu
In this work, we introduce a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos.
no code implementations • 3 Sep 2024 • Junpeng Jiang, Gangyi Hong, Lijun Zhou, Enhui Ma, Hengtong Hu, Xia Zhou, Jie Xiang, Fan Liu, Kaicheng Yu, Haiyang Sun, Kun Zhan, Peng Jia, Miao Zhang
Generating high-fidelity, temporally consistent videos in autonomous driving scenarios faces a significant challenge, e. g. problematic maneuvers in corner cases.
1 code implementation • 24 Jul 2024 • Xiaobiao Du, Haiyang Sun, Ming Lu, Tianqing Zhu, Xin Yu
With this dataset, we make the generative model more robust to cars.
no code implementations • 7 Jun 2024 • Xiaobiao Du, Haiyang Sun, Shuyun Wang, Zhuojie Wu, Hongwei Sheng, Jiaying Ying, Ming Lu, Tianqing Zhu, Kun Zhan, Xin Yu
(1) \textbf{High-Volume}: 2, 500 cars are meticulously scanned by 3D scanners, obtaining car images and point clouds with real-world dimensions; (2) \textbf{High-Quality}: Each car is captured in an average of 200 dense, high-resolution 360-degree RGB-D views, enabling high-fidelity 3D reconstruction; (3) \textbf{High-Diversity}: The dataset contains various cars from over 100 brands, collected under three distinct lighting conditions, including reflective, standard, and dark.
no code implementations • 4 Jun 2024 • Lijun Zhou, Tao Tang, Pengkun Hao, Zihang He, Kalok Ho, Shuo Gu, Wenbo Hou, Zhihui Hao, Haiyang Sun, Kun Zhan, Peng Jia, Xianpeng Lang, Xiaodan Liang
Secondly, we propose an Uncertainty-guided Query Denoising strategy to further enhance the training process.
no code implementations • 3 Jun 2024 • Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng Yu
Instead of randomly generating new data, we further design a sampling policy to let Delphi generate new data that are similar to those failure cases to improve the sample efficiency.
2 code implementations • 26 Apr 2024 • Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, JianHua Tao
However, this process may lead to inaccurate annotations, such as ignoring non-majority or non-candidate labels.
1 code implementation • 28 Mar 2024 • Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao
However, the exploration of 3D dense captioning in outdoor scenes is hindered by two major challenges: 1) the domain gap between indoor and outdoor scenes, such as dynamics and sparse visual inputs, makes it difficult to directly adapt existing indoor methods; 2) the lack of data with comprehensive box-caption pair annotations specifically tailored for outdoor scenes.
no code implementations • 22 Mar 2024 • Zhuofan Wen, Fengyu Zhang, Siyuan Zhang, Haiyang Sun, Mingyu Xu, Licai Sun, Zheng Lian, Bin Liu, JianHua Tao
Multimodal fusion is a significant method for most multimodal tasks.
no code implementations • 18 Feb 2024 • Kang Chen, Zheng Lian, Haiyang Sun, Rui Liu, Jiangyan Yi, Bin Liu, JianHua Tao
Deception detection has attracted increasing attention due to its importance in real-world scenarios.
no code implementations • 2 Jan 2024 • Tao Tang, Dafeng Wei, Zhengyu Jia, Tian Gao, Changwei Cai, Chengkai Hou, Peng Jia, Kun Zhan, Haiyang Sun, Jingchen Fan, Yixing Zhao, Fu Liu, Xiaodan Liang, Xianpeng Lang, Yang Wang
Furthermore, there lack of well-formed retrieval datasets for effective evaluation.
2 code implementations • 2 Jan 2024 • Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng
Recent methods extend NeRF by incorporating tracked vehicle poses to animate vehicles, enabling photo-realistic view synthesis of dynamic urban street scenes.
1 code implementation • 31 Dec 2023 • Licai Sun, Zheng Lian, Kexin Wang, Yu He, Mingyu Xu, Haiyang Sun, Bin Liu, JianHua Tao
Video-based facial affect analysis has recently attracted increasing attention owing to its critical role in human-computer interaction.
Ranked #6 on
Dynamic Facial Expression Recognition
on FERV39k
Dynamic Facial Expression Recognition
Emotion Recognition
+2
no code implementations • 12 Dec 2023 • Hu Zhang, Jianhua Xu, Tao Tang, Haiyang Sun, Xin Yu, Zi Huang, Kaicheng Yu
OpenSight utilizes 2D-3D geometric priors for the initial discernment and localization of generic objects, followed by a more specific semantic interpretation of the detected objects.
1 code implementation • 7 Dec 2023 • Zheng Lian, Licai Sun, Haiyang Sun, Kang Chen, Zhuofan Wen, Hao Gu, Bin Liu, JianHua Tao
To bridge this gap, we present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks: visual sentiment analysis, tweet sentiment analysis, micro-expression recognition, facial emotion recognition, dynamic facial emotion recognition, and multimodal emotion recognition.
no code implementations • 12 Jun 2023 • Haiyang Sun, FuLin Zhang, Yingying Gao, Zheng Lian, Shilei Zhang, Junlan Feng
Considering comprehensiveness, we partition speech knowledge into Textual-related Emotional Content (TEC) and Speech-related Emotional Content (SEC), capturing cues from both semantic and acoustic perspectives, and we design a new architecture search space to fully leverage them.
3 code implementations • 18 Apr 2023 • Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, Björn W. Schuller, JianHua Tao
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
no code implementations • 20 Aug 2022 • Chenglong Wang, Jiangyan Yi, JianHua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu
The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure.
no code implementations • 23 Jul 2022 • Haiyang Sun, Zheng Lian, Bin Liu, JianHua Tao, Licai Sun, Cong Cai
In this paper, we propose the solution to the Multi-Task Learning (MTL) Challenge of the 4th Affective Behavior Analysis in-the-wild (ABAW) competition.
1 code implementation • 30 May 2022 • Kaicheng Yu, Tang Tao, Hongwei Xie, Zhiwei Lin, Zhongwei Wu, Zhongyu Xia, TingTing Liang, Haiyang Sun, Jiong Deng, Dayang Hao, Yongtao Wang, Xiaodan Liang, Bing Wang
There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR.
no code implementations • 25 Mar 2022 • Haiyang Sun, Zheng Lian, Bin Liu, Ying Li, Licai Sun, Cong Cai, JianHua Tao, Meng Wang, Yuan Cheng
Speech emotion recognition (SER) is an important research topic in human-computer interaction.