1 code implementation • 24 Dec 2024 • Zhibin Wang, Yanxin Cai, Jiayi Zhou, Yangming Zhang, Tianyu Li, Wei Li, Xun Liu, Guoqing Wang, Yang Yang
Efficient compression of remote sensing imagery is a critical solution to alleviate these burdens on satellites.
no code implementations • 4 Nov 2024 • Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, Liang Pan
Furthermore, MVPaint employs a UVR module to improve the texture quality in the UV space, which first performs a UV-space Super-Resolution, followed by a Spatial-aware Seam-Smoothing algorithm for revising spatial texturing discontinuities caused by UV unwrapping.
no code implementations • 18 Oct 2024 • Zhibin Wang, Shipeng Li, YuHang Zhou, Xue Li, Rong Gu, Nguyen Cam-Tu, Chen Tian, Sheng Zhong
In this paper, we revisit SLO and goodput metrics in LLM serving and propose a unified metric framework smooth goodput including SLOs and goodput to reflect the nature of user experience in LLM serving.
no code implementations • 30 Sep 2024 • Yabing Wang, Le Wang, Qiang Zhou, Zhibin Wang, Hao Li, Gang Hua, Wei Tang
Then, we take these semantic slots as internal features and leverage them to interact with the visual features.
1 code implementation • 26 Aug 2024 • Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, Rongrong Ji
We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models.
no code implementations • 30 Jul 2024 • Zheng Lin, Zhenxing Niu, Zhibin Wang, Yinghui Xu
MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination.
1 code implementation • 23 Jul 2024 • Yiwei Ma, Zhibin Wang, Xiaoshuai Sun, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji
However, the quadratic complexity of the vision encoder in MLLMs constrains the resolution of input images.
Ranked #162 on Visual Question Answering on MM-Vet
1 code implementation • 31 May 2024 • Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen
The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications.
1 code implementation • 28 Jan 2024 • Shaofeng Zhang, Jinfa Huang, Qiang Zhou, Zhibin Wang, Fan Wang, Jiebo Luo, Junchi Yan
At inference, we generate images with arbitrary expansion multiples by inputting an anchor image and its corresponding positional embeddings.
1 code implementation • CVPR 2024 • Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong liu, Gang Yu
This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs.
no code implementations • 27 Nov 2023 • Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang
Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
1 code implementation • 12 Nov 2023 • Qiang Zhou, Zhibin Wang, Wei Chu, Yinghui Xu, Hao Li, Yuan Qi
Our experiments demonstrate that preserving the positional information of visual embeddings through the pool-adapter is particularly beneficial for tasks like visual grounding.
Ranked #170 on Visual Question Answering on MM-Vet
no code implementations • ICCV 2023 • Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen
In this paper, we propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
1 code implementation • 20 Aug 2023 • Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei
However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models.
Ranked #145 on Visual Question Answering on MM-Vet
no code implementations • 14 Aug 2023 • Chaohui Yu, Qiang Zhou, Zhibin Wang, Fan Wang
Second, we propose an align-guided contrastive loss to refine the alignment of vision and text embeddings.
no code implementations • 4 Aug 2023 • Qiang Zhou, Chaohui Yu, Jingliang Li, Yuang Liu, Jing Wang, Zhibin Wang
to provide additional consistency constraints, which grows GPU memory consumption and complicates the model's structure and training pipeline.
no code implementations • 27 Jul 2023 • Jingliang Li, Qiang Zhou, Chaohui Yu, Zhengda Lu, Jun Xiao, Zhibin Wang, Fan Wang
To make the constructed volumes as close as possible to the surfaces of objects in the scene and the rendered depth more accurate, we propose to perform depth prediction and radiance field reconstruction simultaneously.
no code implementations • 26 Jul 2023 • Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang
To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points.
no code implementations • 5 Jun 2023 • Lei Chen, Fei Du, Yuan Hu, Fan Wang, Zhibin Wang
Recurrent predictions for future atmospheric fields are firstly performed at 1. 40625-degree resolution, and then a diffusion-based super-resolution model is leveraged to recover the high spatial resolution and finer-scale atmospheric details.
1 code implementation • 26 Apr 2023 • Fangjian Lin, Jianlong Yuan, Sitong Wu, Fan Wang, Zhibin Wang
Interestingly, the ranking of these spatial token mixers also changes under our UniNeXt, suggesting that an excellent spatial token mixer may be stifled due to a suboptimal general architecture, which further shows the importance of the study on the general architecture of vision backbone.
no code implementations • 1 Mar 2023 • Qiang Zhou, Chaohui Yu, Zhibin Wang, Fan Wang
In this paper, we propose an end-to-end framework for oriented object detection, which simplifies the model pipeline and obtains superior performance.
no code implementations • CVPR 2023 • Chaohui Yu, Qiang Zhou, Jingliang Li, Jianlong Yuan, Zhibin Wang, Fan Wang
In this work, we propose a novel and data-efficient framework for WILSS, named FMWISS.
no code implementations • 27 Feb 2023 • Qiang Zhou, Yuang Liu, Chaohui Yu, Jingliang Li, Zhibin Wang, Fan Wang
Instead of relabeling each dataset with the unified taxonomy, a category-guided decoding module is designed to dynamically guide predictions to each datasets taxonomy.
1 code implementation • CVPR 2023 • Fei Du, Jianlong Yuan, Zhibin Wang, Fan Wang
To this end, we propose an efficient method to correct the mask with a lightweight mask correction network.
no code implementations • 3 Nov 2022 • Yuan Hu, Zhibin Wang, Zhou Huang, Yu Liu
Given a set of polygon queries, the model learns the relations among them and encodes context information from the image to predict the final set of building polygons with fixed vertex numbers.
no code implementations • 19 Oct 2022 • Zhibin Wang, Yapeng Zhao, Yong Zhou, Yuanming Shi, Chunxiao Jiang, Khaled B. Letaief
The rapid advancement of artificial intelligence technologies has given rise to diversified intelligent services, which place unprecedented demands on massive connectivity and gigantic data aggregation.
no code implementations • 18 Oct 2022 • Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen
In this paper, we address monocular depth estimation with deep neural networks.
Ranked #7 on Monocular Depth Estimation on ETH3D
no code implementations • 7 Sep 2022 • Qiang Zhou, Chaohui Yu, Hao Luo, Zhibin Wang, Hao Li
Specifically, MimCo takes a pre-trained contrastive learning model as the teacher model and is pre-trained with two types of learning targets: patch-level and image-level reconstruction losses.
1 code implementation • 30 Aug 2022 • Jianlong Yuan, Qian Qi, Fei Du, Zhibin Wang, Fan Wang, Yifan Liu
Inspired by the recent progress on semantic directions on feature-space, we propose to include augmentations in feature space for efficient distillation.
1 code implementation • 24 Aug 2022 • Jianlong Yuan, Jinchao Ge, Zhibin Wang, Yifan Liu
More specifically, we use the pseudo-labels generated by a mean teacher to supervise the student network to achieve a mutual knowledge distillation between the two branches.
no code implementations • 2 Aug 2022 • Mengzhu Wang, Jianlong Yuan, Qi Qian, Zhibin Wang, Hao Li
Further, we provide an in-depth analysis of the mechanism and rational behind our approach, which gives us a better understanding of why leverage logits in lieu of features can help domain generalization.
no code implementations • 6 Jun 2022 • Zhibin Wang, Yong Zhou, Yuanming Shi, Weihua Zhuang
We characterize the Pareto boundary of the error-induced gap region to quantify the learning performance trade-off among different FL tasks, based on which we formulate an optimization problem to minimize the sum of error-induced gaps in all cells.
no code implementations • 1 Jun 2022 • Yongtao Ge, Qiang Zhou, Xinlong Wang, Zhibin Wang, Hao Li, Chunhua Shen
Point annotations are considerably more time-efficient than bounding box annotations.
no code implementations • 28 May 2022 • Qiang Zhou, Chaohui Yu, Zhibin Wang, Hao Li
To tackle this problem, we propose a purely angle-free framework for rotated object detection, called Point RCNN, which mainly consists of PointRPN and PointReg.
no code implementations • 26 May 2022 • Yuan Hu, Lei Chen, Zhibin Wang, Hao Li
We also compare four categories of perturbation methods for ensemble forecasting, i. e. fixed distribution perturbation, learned distribution perturbation, MC dropout, and multi model ensemble.
1 code implementation • 19 Jan 2022 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, Anton Van Den Hengel
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Ranked #2 on Keypoint Detection on MS COCO
no code implementations • ICCV 2021 • Weitao Chen, Zhibin Wang, Hao Li
Percentage of image size is often used as the threshold of PCK.
1 code implementation • ICCV 2021 • Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, Hao Li
Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics.
no code implementations • 29 Mar 2021 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang
We propose a human pose estimation framework that solves the task in the regression-based fashion.
Ranked #26 on Pose Estimation on MPII Human Pose (using extra training data)
1 code implementation • CVPR 2021 • Qiang Zhou, Chaohui Yu, Zhibin Wang, Qi Qian, Hao Li
To alleviate the confirmation bias problem and improve the quality of pseudo annotations, we further propose a co-rectify scheme based on Instant-Teaching, denoted as Instant-Teaching$^*$.
Ranked #12 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)
no code implementations • 28 Jan 2021 • Qiang Zhou, Chaohui Yu, Chunhua Shen, Zhibin Wang, Hao Li
On the COCO dataset, our simple design achieves superior performance compared to both the FCOS baseline detector with NMS post-processing and the recent end-to-end NMS-free detectors.
no code implementations • 10 Nov 2020 • Zhibin Wang, Jiahang Qiu, Yong Zhou, Yuanming Shi, Liqun Fu, Wei Chen, Khaled B. Lataief
To optimize the learning performance, we formulate an optimization problem that jointly optimizes the device selection, the aggregation beamformer at the base station (BS), and the phase shifts at the IRS to maximize the number of devices participating in the model aggregation of each communication round under certain mean-squared-error (MSE) requirements.