1 code implementation • 27 Dec 2024 • Jiaxin Gao, WenBo Hu, Yuntian Chen
Revisiting PCA for Time Series Reduction in Temporal Dimension; Jiaxin Gao, Wenbo Hu, Yuntian Chen; Deep learning has significantly advanced time series analysis (TSA), enabling the extraction of complex patterns for tasks like classification, forecasting, and regression.
no code implementations • 19 Dec 2024 • Xiaoning Dong, WenBo Hu, Wei Xu, Tianxing He
It then employs a simple assistive task such as a masked language model task or an element lookup by position task to encode the semantics of the masked keywords.
1 code implementation • 19 Dec 2024 • Zijun Chen, WenBo Hu, Guande He, Zhijie Deng, Zheng Zhang, Richang Hong
This paper investigates representative MLLMs, focusing on their calibration across various scenarios, including before and after visual fine-tuning, as well as before and after multimodal training of the base LLMs.
no code implementations • 4 Dec 2024 • Lingen Li, Zhaoyang Zhang, Yaowei Li, Jiale Xu, WenBo Hu, Xiaoyu Li, Weihao Cheng, Jinwei Gu, Tianfan Xue, Ying Shan
Recent advancements in generative models have significantly improved novel view synthesis (NVS) from multi-view data.
1 code implementation • 27 Nov 2024 • Cheng-Fu Yang, Da Yin, WenBo Hu, Nanyun Peng, Bolei Zhou, Kai-Wei Chang
Humans recognize objects after observing only a few examples, a remarkable capability enabled by their inherent language understanding of the real-world environment.
no code implementations • 18 Nov 2024 • Dadong Jiang, Zhihui Ke, Xiaobo Zhou, Zhi Hou, Xianghui Yang, WenBo Hu, Tie Qiu, Chunchao Guo
To address the above issue, we design a plug-and-play module called TimeFormer to enable existing deformable 3D Gaussians reconstruction methods with the ability to implicitly model motion patterns from a learning perspective.
no code implementations • 10 Oct 2024 • WenBo Hu, Jia-Chen Gu, Zi-Yi Dou, Mohsen Fayyaz, Pan Lu, Kai-Wei Chang, Nanyun Peng
In this paper, we introduce a multimodal retrieval-augmented generation benchmark, MRAG-Bench, in which we systematically identify and categorize scenarios where visually augmented knowledge is better than textual knowledge, for instance, more images from varying viewpoints.
no code implementations • 11 Sep 2024 • Sijie Zhao, WenBo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan
This paper presents a novel framework for converting 2D videos to immersive stereoscopic 3D, addressing the growing demand for 3D content in immersive experience.
1 code implementation • 3 Sep 2024 • Wangbo Yu, Jinbo Xing, Li Yuan, WenBo Hu, Xiaoyu Li, Zhipeng Huang, Xiangjun Gao, Tien-Tsin Wong, Ying Shan, Yonghong Tian
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames with precise camera pose control.
1 code implementation • 3 Sep 2024 • WenBo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, Ying Shan
Our training approach enables the model to generate depth sequences with variable lengths at one time, up to 110 frames, and harvest both precise depth details and rich content diversity from realistic and synthetic datasets.
1 code implementation • 30 May 2024 • Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, WenBo Hu, Ying Shan
Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization.
1 code implementation • 29 May 2024 • WenBo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang
This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources?
no code implementations • 28 May 2024 • Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, WenBo Hu, Chaopeng Zhang, Yao Yao, Ying Shan, Long Quan
This approach reduces the need to design various algorithms for different types of Gaussian manipulation.
1 code implementation • 3 May 2024 • Junchen Liu, WenBo Hu, Zhuo Yang, Jianteng Chen, Guoliang Wang, Xiaoxue Chen, Yantong Cai, Huan-ang Gao, Hao Zhao
Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively and efficiently characterize anisotropic areas induced by the cone-casting procedure.
1 code implementation • 22 Apr 2024 • Haoyi Qiu, WenBo Hu, Zi-Yi Dou, Nanyun Peng
Our work also highlights the critical balance between faithfulness and coverage of model outputs, and encourages future works to address hallucinations in LVLMs while keeping their outputs informative.
no code implementations • CVPR 2024 • Haoyuan Wang, WenBo Hu, Lei Zhu, Rynson W. H. Lau
Our method has two stages: the geometry of the target object and the pre-filtered environmental radiance fields are reconstructed in the first stage, and materials of the target object are estimated in the second stage with the proposed NeP and material-aware cone sampling strategy.
no code implementations • 22 Mar 2024 • Zheng Zhang, WenBo Hu, Yixing Lao, Tong He, Hengshuang Zhao
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance.
no code implementations • 17 Mar 2024 • Zhihao Liang, Qi Zhang, WenBo Hu, Ying Feng, Lei Zhu, Kui Jia
This is because 3DGS treats each pixel as an isolated, single point rather than as an area, causing insensitivity to changes in the footprints of pixels.
no code implementations • 15 Mar 2024 • Tian-Xing Xu, WenBo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang
3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering.
no code implementations • 3 Dec 2023 • Ying Liu, Peng Cui, WenBo Hu, Richang Hong
Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model.
no code implementations • 19 Oct 2023 • Bangbang Yang, Wenqi Dong, Lin Ma, WenBo Hu, Xiao Liu, Zhaopeng Cui, Yuewen Ma
To ensure meaningful and aligned textures to the scene, we develop a novel coarse-to-fine panoramic texture generation approach with dual texture alignment, which both considers the geometry and texture cues of the captured scenes.
no code implementations • 18 Oct 2023 • Guande He, Peng Cui, Jianfei Chen, WenBo Hu, Jun Zhu
Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs.
no code implementations • 10 Oct 2023 • Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, WenBo Hu, Long Quan, Ying Shan, Yonghong Tian
Our contributions are twofold: First, we propose a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods.
1 code implementation • 19 Aug 2023 • WenBo Hu, Yifan Xu, Yi Li, Weiyue Li, Zeyuan Chen, Zhuowen Tu
BLIVA demonstrates significant capability in decoding real-world images, irrespective of text presence.
no code implementations • ICCV 2023 • WenBo Hu, Yuling Wang, Lin Ma, Bangbang Yang, Lin Gao, Xiao Liu, Yuewen Ma
Despite the tremendous progress in neural radiance fields (NeRF), we still face a dilemma of the trade-off between quality and efficiency, e. g., MipNeRF presents fine-detailed and anti-aliased renderings but takes days for training, while Instant-ngp can accomplish the reconstruction in a few minutes but suffers from blurring or aliasing when rendering at various distances or resolutions due to ignoring the sampling area.
1 code implementation • 30 May 2023 • Jiaxin Gao, WenBo Hu, Yuntian Chen
Long-term time series forecasting (LTSF) is a crucial aspect of modern society, playing a pivotal role in facilitating long-term planning and developing early warning systems.
no code implementations • 16 May 2023 • Youze Wang, WenBo Hu, Richang Hong
Multimodal learning involves developing models that can integrate information from various sources like images and texts.
no code implementations • 3 Apr 2023 • WenBo Hu, Hongjian Zhan, Xinchen Ma, Cong Liu, Bing Yin, Yue Lu
In the field of historical manuscript research, scholars frequently encounter novel symbols in ancient texts, investing considerable effort in their identification and documentation.
no code implementations • 23 Mar 2023 • WenBo Hu, Xin Sun, Qiang Liu, Le Wu, Liang Wang
To address this, we evaluate the quality of propensity scores from the perspective of uncertainty calibration, proposing the use of expected calibration error (ECE) as a measure of propensity-score quality.
1 code implementation • SIGGRAPH 2022 • Menghan Xia, WenBo Hu, Tien-Tsin Wong, Jue Wang
Our key insight is that several carefully located anchors could approximately represent the color distribution of an image, and conditioned on the anchor colors, we can predict the image color in a deterministic manner by utilizing internal correlation.
no code implementations • 5 Oct 2022 • Jiaxin Gao, WenBo Hu, Dongxiao Zhang, Yuntian Chen
Accurate electrical load forecasting is beneficial for better scheduling of electricity generation and saving electrical energy.
1 code implementation • 22 Apr 2022 • Runzhe Zhu, Ling Yin, Mingze Yang, Fei Wu, Yuncheng Yang, WenBo Hu
However, existing public datasets do not include images obtained by drones at different heights, and the types of scenes are relatively homogeneous, which yields issues in assessing a model's capability to adapt to complex and changing scenes.
no code implementations • 29 Jan 2022 • Jinbo Xing, WenBo Hu, Tien-Tsin Wong
In this paper, we propose a scale-Arbitrary Invertible image Downscaling Network (AIDN), to natively downscale HR images with arbitrary scale factors.
1 code implementation • 16 Jul 2021 • WenBo Hu, Changgong Zhang, Fangneng Zhan, Lei Zhang, Tien-Tsin Wong
Based on this representation, we further propose a spatial-temporal conditional directed graph convolution to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses.
Ranked #17 on 3D Human Pose Estimation on Human3.6M
no code implementations • ICCV 2021 • Fangneng Zhan, Changgong Zhang, WenBo Hu, Shijian Lu, Feiying Ma, Xuansong Xie, Ling Shao
Accurate lighting estimation is challenging yet critical to many computer vision and computer graphics tasks such as high-dynamic-range (HDR) relighting.
1 code implementation • 2 Jun 2021 • Yingtao Luo, Qiang Liu, Yuntian Chen, WenBo Hu, Tian Tian, Jun Zhu
Especially, the discovery of PDEs with highly nonlinear coefficients from low-quality data remains largely under-addressed.
1 code implementation • 18 May 2021 • Wenkai Li, WenBo Hu, Ting Chen, Ning Chen, Cheng Feng
We also leverage a graph learning module to learn a sparse adjacency matrix to explicitly capture the stable interrelation structure among multiple time series channels for the interpretable pattern reconstruction of interrelated channels.
no code implementations • 28 Mar 2021 • Peng Cui, Zhijie Deng, WenBo Hu, Jun Zhu
It is critical yet challenging for deep learning models to properly characterize uncertainty that is pervasive in real-world environments.
1 code implementation • CVPR 2021 • WenBo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia, Tien-Tsin Wong
Via the \emph{BPM}, complementary 2D and 3D information can interact with each other in multiple architectural levels, such that advantages in these two visual domains can be combined for better scene recognition.
Ranked #19 on Semantic Segmentation on ScanNet
1 code implementation • 20 Feb 2021 • Fangneng Zhan, Yingchen Yu, Changgong Zhang, Rongliang Wu, WenBo Hu, Shijian Lu, Feiying Ma, Xuansong Xie, Ling Shao
This paper presents Geometric Mover's Light (GMLight), a lighting estimation framework that employs a regression network and a generative projector for effective illumination estimation.
1 code implementation • ICCV 2021 • Menghan Xia, WenBo Hu, Xueting Liu, Tien-Tsin Wong
Existing halftoning algorithms usually drop colors and fine details when dithering color images with binary dot patterns, which makes it extremely difficult to recover the original information.
no code implementations • 16 Dec 2020 • Qingyi Pan, WenBo Hu, Jun Zhu
Though deep learning methods have recently been developed to give superior forecasting results, it is crucial to improve the interpretability of time series models.
1 code implementation • 3 Sep 2020 • Wenbo Hu, Menghan Xia, Chi-Wing Fu, Tien-Tsin Wong
This paper presents the idea ofmono-nizingbinocular videos and a frame-work to effectively realize it.
Image and Video Processing Graphics
no code implementations • 1 Jul 2017 • Wenbo Hu, Lifeng Hua, Lei LI, Hang Su, Tian Wang, Ning Chen, Bo Zhang
This paper presents a Semantic Attribute Modulation (SAM) for language modeling and style variation.
no code implementations • 27 Apr 2015 • Wenbo Hu, Jun Zhu, Bo Zhang
Bayesian max-margin models have shown superiority in various practical applications, such as text categorization, collaborative prediction, social network link prediction and crowdsourcing, and they conjoin the flexibility of Bayesian modeling and predictive strengths of max-margin learning.