no code implementations • 6 May 2025 • Lei Liu, Zhenghao Chen, Dong Xu
To enhance lossless compression, the resulting MoP feature is utilized as a hyperprior to improve conditional entropy modeling.
no code implementations • 29 Apr 2025 • Shiyin Jiang, Zhenghao Chen, Minghao Han, Xingyu Zhou, Leheng Zhang, Shuhang Gu
Building upon HDC, we introduce a novel end-to-end optimized neural stereo video compression framework, which integrates HDC-based modules into key coding operations, including cross-view feature extraction and reconstruction (HDC-FER) and cross-view entropy modeling (HDC-EM).
no code implementations • 13 Mar 2025 • Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, Feng Xia
However, existing surveys have not provided a unified summary of the wide range of model architectures in this field, nor have they given detailed summaries of works in feature extraction and datasets.
no code implementations • 12 Feb 2025 • Chuhan Wang, Zhenghao Chen, Jean Y. H. Yang, Jinman Kim
The proposed GCU addresses resolution loss in the U-shaped backbone by preserving global contextual features and fine-grained details during multiscale downsampling.
1 code implementation • 10 Jan 2025 • Ziheng Wu, Zhenghao Chen, Ruipu Luo, Can Zhang, Yuan Gao, Zhentao He, Xian Wang, Haoran Lin, Minghui Qiu
Recently, vision-language models have made remarkable progress, demonstrating outstanding capabilities in various tasks such as image captioning and video understanding.
no code implementations • 8 Jan 2025 • Lei Liu, Zhenghao Chen, Zhihao Hu, Dong Xu
While most existing neural image compression (NIC) and neural video compression (NVC) methodologies have achieved remarkable success, their optimization is primarily focused on human visual perception.
no code implementations • 27 Nov 2024 • Yiming Wu, Huan Wang, Zhenghao Chen, Dong Xu
Additionally, we propose an \textbf{Individual Content and Motion Dynamics (ICMD)} Consistency Loss to gain comparable generation performance as larger VDM, i. e., the teacher to VDMini i. e., the student.
no code implementations • 27 Nov 2024 • Lei Liu, Zhenghao Chen, Dong Xu
To effectively reduce structural redundancy across attributes, we apply a progressive coding algorithm to generate hyperprior features, in which we use previously compressed attributes and location as prior information.
no code implementations • 4 Oct 2024 • Sicheng Yu, Chengkai Jin, Huanyu Wang, Zhenghao Chen, Sheng Jin, Zhongrong Zuo, Xiaolei Xu, Zhenbang Sun, Bingni Zhang, Jiawei Wu, Hao Zhang, Qianru Sun
Video Large Language Models (Video-LLMs) have made remarkable progress in video understanding tasks.
1 code implementation • 20 Jun 2024 • Tao Han, Song Guo, Zhenghao Chen, Wanghan Xu, Lei Bai
As a result, it enables a better training of models and a more accurate assessment of the real-world forecasting capabilities of TSF models, pushing them closer to in-situ applications.
no code implementations • 2 Jun 2024 • Lei Liu, Zhihao Hu, Zhenghao Chen
To address this, we propose a point cloud compression framework that simultaneously handles both human and machine vision tasks.
1 code implementation • 24 May 2024 • Zicheng Wang, Zhenghao Chen, Yiming Wu, Zhen Zhao, Luping Zhou, Dong Xu
In this study, we introduce PoinTramba, a pioneering hybrid framework that synergies the analytical power of Transformer with the remarkable computational efficiency of Mamba for enhanced point cloud analysis.
no code implementations • 7 May 2024 • Zhenghao Chen, Luping Zhou, Zhihao Hu, Dong Xu
Content-adaptive compression is crucial for enhancing the adaptability of the pre-trained neural codec for various contents.
1 code implementation • 6 May 2024 • Tao Han, Zhenghao Chen, Song Guo, Wanghan Xu, Lei Bai
To mitigate this issue, we introduce an efficient neural codec, the Variational Autoencoder Transformer (VAEformer), for extreme compression of climate data to significantly reduce data storage cost, making AI-based meteorological research portable to researchers.
no code implementations • 2 Apr 2024 • Qianhui Zhao, Fang Liu, Li Zhang, Yang Liu, Zhen Yan, Zhenghao Chen, Yufei Zhou, Jing Jiang, Ge Li
Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments.
no code implementations • 4 Dec 2023 • Ling Yang, Zhanyu Wang, Zhenghao Chen, Xinyu Liang, Luping Zhou
Multimodal Large Language Models (MLLMs) have shown success in various general image processing tasks, yet their application in medical imaging is nascent, lacking tailored models.
1 code implementation • 27 Nov 2023 • Xinhui Liu, Zhenghao Chen, Luping Zhou, Dong Xu, Wei Xi, Gairui Bai, Yihan Zhao, Jizhong Zhao
Conventional Federated Domain Adaptation (FDA) approaches usually demand an abundance of assumptions, which makes them significantly less feasible for real-world situations and introduces security hazards.
no code implementations • 4 Sep 2023 • Xianghui Yang, Guosheng Lin, Zhenghao Chen, Luping Zhou
Recent neural networks based surface reconstruction can be roughly divided into two categories, one warping templates explicitly and the other representing 3D surfaces implicitly.
2 code implementations • CVPR 2023 • Xianghui Yang, Guosheng Lin, Zhenghao Chen, Luping Zhou
Deep neural networks (DNNs) are widely applied for nowadays 3D surface reconstruction tasks and such methods can be further divided into two categories, which respectively warp templates explicitly by moving vertices or represent 3D surfaces implicitly as signed or unsigned distance functions.
1 code implementation • 31 Aug 2022 • ZiMing Wang, Xiaoliang Huo, Zhenghao Chen, Jing Zhang, Lu Sheng, Dong Xu
In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence.
no code implementations • CVPR 2022 • Zhenghao Chen, Guo Lu, Zhihao Hu, Shan Liu, Wei Jiang, Dong Xu
In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i. e., stereo videos from autonomous driving applications) from both left and right views.
no code implementations • 28 May 2021 • Zhenghao Chen, Shuhang Gu, Feng Zhu, Jing Xu, Rui Zhao
For the spatial correlation, we aggregate attributes with spatial similarity into a part-based group and then introduce a Group Attention Learning to generate the group attention and the part-based group feature.
no code implementations • ECCV 2020 • Zhihao Hu, Zhenghao Chen, Dong Xu, Guo Lu, Wanli Ouyang, Shuhang Gu
In this work, we propose a new framework called Resolution-adaptive Flow Coding (RaFC) to effectively compress the flow maps globally and locally, in which we use multi-resolution representations instead of single-resolution representations for both the input flow maps and the output motion features of the MV encoder.
3 code implementations • CVPR 2020 • Ziyu Liu, Hongwen Zhang, Zhenghao Chen, Zhiyong Wang, Wanli Ouyang
Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics.
Ranked #4 on
3D Action Recognition
on Assembly101
no code implementations • 23 Jan 2019 • Yeu-Chern Harn, Zhenghao Chen, Vladimir Jojic
In this work, we propose a composition/decomposition framework for adversarially training generative models on composed data - data where each sample can be thought of as being constructed from a fixed number of components.
no code implementations • 9 Jul 2017 • Zhenghao Chen, Jianlong Zhou, Xiuying Wang
This method aggregates three main parts that are Back-end Data Model, Neural Net Algorithm including clustering method Self-Organizing Map (SOM) and prediction approach Recurrent Neural Net (RNN) for ex- tracting the features and lastly a solid front-end that displays the results to users with an interactive system.
1 code implementation • 9 Jul 2013 • Chris Piech, Jonathan Huang, Zhenghao Chen, Chuong Do, Andrew Ng, Daphne Koller
In massive open online courses (MOOCs), peer grading serves as a critical tool for scaling the grading of complex, open-ended assignments to courses with tens or hundreds of thousands of students.
no code implementations • NeurIPS 2011 • Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng
Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification.
no code implementations • NeurIPS 2010 • Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang W. Koh, Quoc V. Le, Andrew Y. Ng
Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture.