Search Results for author: Jiaming Liu

Found 86 papers, 34 papers with code

Image Restoration using Total Variation Regularized Deep Image Prior

no code implementations30 Oct 2018 Jiaming Liu, Yu Sun, Xiaojian Xu, Ulugbek S. Kamilov

In the past decade, sparsity-driven regularization has led to significant improvements in image reconstruction.

Deblurring Image Denoising +2

TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network

no code implementations24 Dec 2018 Yipeng Sun, Chengquan Zhang, Zuming Huang, Jiaming Liu, Junyu Han, Errui Ding

Reading text from images remains challenging due to multi-orientation, perspective distortion and especially the curved nature of irregular text.

Optical Character Recognition (OCR) Text Detection

Detecting Text in the Wild with Deep Character Embedding Network

no code implementations2 Jan 2019 Jiaming Liu, Chengquan Zhang, Yipeng Sun, Junyu Han, Errui Ding

However, text in the wild is usually perspectively distorted or curved, which can not be easily tackled by existing approaches.

Clustering Text Detection

Frame-Recurrent Video Inpainting by Robust Optical Flow Inference

no code implementations8 May 2019 Yifan Ding, Chuan Wang, Haibin Huang, Jiaming Liu, Jue Wang, Liqiang Wang

Compared with image inpainting, performing this task on video presents new challenges such as how to preserving temporal consistency and spatial details, as well as how to handle arbitrary input video size and length fast and efficiently.

Image Inpainting Optical Flow Estimation +1

Block Coordinate Regularization by Denoising

1 code implementation NeurIPS 2019 Yu Sun, Jiaming Liu, Ulugbek S. Kamilov

In this work, we develop a new block coordinate RED algorithm that decomposes a large-scale estimation problem into a sequence of updates over a small subset of the unknown variables.

Denoising

Semi-supervised Skin Detection by Network with Mutual Guidance

no code implementations ICCV 2019 Yi He, Jiayuan Shi, Chuan Wang, Haibin Huang, Jiaming Liu, Guanbin Li, Risheng Liu, Jue Wang

In this paper we present a new data-driven method for robust skin detection from a single human portrait image.

Editing Text in the Wild

2 code implementations8 Aug 2019 Liang Wu, Chengquan Zhang, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding, Xiang Bai

Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module.

Image Inpainting Image-to-Image Translation +1

Online Regularization by Denoising with Applications to Phase Retrieval

no code implementations4 Sep 2019 Zihui Wu, Yu Sun, Jiaming Liu, Ulugbek S. Kamilov

Regularization by denoising (RED) is a powerful framework for solving imaging inverse problems.

Denoising Retrieval

Disentangled Image Matting

no code implementations ICCV 2019 Shaofan Cai, Xiaoshuai Zhang, Haoqiang Fan, Haibin Huang, Jiangyu Liu, Jiaming Liu, Jiaying Liu, Jue Wang, Jian Sun

Most previous image matting methods require a roughly-specificed trimap as input, and estimate fractional alpha values for all pixels that are in the unknown region of the trimap.

Image Matting

Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning

no code implementations ICCV 2019 Yipeng Sun, Jiaming Liu, Wei Liu, Junyu Han, Errui Ding, Jingtuo Liu

Most existing text reading benchmarks make it difficult to evaluate the performance of more advanced deep learning models in large vocabularies due to the limited amount of training data.

Infusing Learned Priors into Model-Based Multispectral Imaging

no code implementations20 Sep 2019 Jiaming Liu, Yu Sun, Ulugbek S. Kamilov

We introduce a new algorithm for regularized reconstruction of multispectral (MS) images from noisy linear measurements.

Denoising Image Reconstruction

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

1 code implementation20 Sep 2019 He guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding

Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts.

Entity Extraction using GAN Optical Character Recognition (OCR)

FGSD: A Dataset for Fine-Grained Ship Detection in High Resolution Satellite Images

no code implementations15 Mar 2020 Kaiyan Chen, Ming Wu, Jiaming Liu, Chuang Zhang

To further promote the research of ship detection, we introduced a new fine-grained ship detection datasets, which is named as FGSD.

Provable Convergence of Plug-and-Play Priors with MMSE denoisers

no code implementations15 May 2020 Xiaojian Xu, Yu Sun, Jiaming Liu, Brendt Wohlberg, Ulugbek S. Kamilov

Plug-and-play priors (PnP) is a methodology for regularized image reconstruction that specifies the prior through an image denoiser.

Compressive Sensing Image Reconstruction

Deep Image Reconstruction using Unregistered Measurements without Groundtruth

no code implementations29 Sep 2020 Weijie Gan, Yu Sun, Cihat Eldeniz, Jiaming Liu, Hongyu An, Ulugbek S. Kamilov

One of the key limitations in conventional deep learning based image reconstruction is the need for registered pairs of training images containing a set of high-quality groundtruth images.

Image Reconstruction

Async-RED: A Provably Convergent Asynchronous Block Parallel Stochastic Method using Deep Denoising Priors

no code implementations ICLR 2021 Yu Sun, Jiaming Liu, Yiran Sun, Brendt Wohlberg, Ulugbek S. Kamilov

Regularization by denoising (RED) is a recently developed framework for solving inverse problems by integrating advanced denoisers as image priors.

Denoising

Practical Deep Raw Image Denoising on Mobile Devices

1 code implementation ECCV 2020 Yuzhi Wang, Haibin Huang, Qin Xu, Jiaming Liu, Yiqun Liu, Jue Wang

Deep learning-based image denoising approaches have been extensively studied in recent years, prevailing in many public benchmark datasets.

Efficient Neural Network Image Denoising

Joint Reconstruction and Calibration using Regularization by Denoising

no code implementations26 Nov 2020 Mingyang Xie, Yu Sun, Jiaming Liu, Brendt Wohlberg, Ulugbek S. Kamilov

Cal-RED extends the traditional RED methodology to imaging problems that require the calibration of the measurement operator.

Denoising Image Reconstruction

Contextual Graph Reasoning Networks

no code implementations1 Jan 2021 Zhaoqing Wang, Jiaming Liu, Yangyuxuan Kang, Mingming Gong, Chuang Zhang, Ming Lu, Ming Wu

Graph Reasoning has shown great potential recently in modeling long-range dependencies, which are crucial for various computer vision tasks.

2D Human Pose Estimation Instance Segmentation +4

SGD-Net: Efficient Model-Based Deep Learning with Theoretical Guarantees

1 code implementation22 Jan 2021 Jiaming Liu, Yu Sun, Weijie Gan, Xiaojian Xu, Brendt Wohlberg, Ulugbek S. Kamilov

Deep unfolding networks have recently gained popularity in the context of solving imaging inverse problems.

CoIL: Coordinate-based Internal Learning for Imaging Inverse Problems

1 code implementation9 Feb 2021 Yu Sun, Jiaming Liu, Mingyang Xie, Brendt Wohlberg, Ulugbek S. Kamilov

We propose Coordinate-based Internal Learning (CoIL) as a new deep-learning (DL) methodology for the continuous representation of measurements.

Image Reconstruction

Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition

1 code implementation NeurIPS 2021 Jiaming Liu, M. Salman Asif, Brendt Wohlberg, Ulugbek S. Kamilov

The plug-and-play priors (PnP) and regularization by denoising (RED) methods have become widely used for solving inverse problems by leveraging pre-trained deep denoisers as image priors.

Compressive Sensing Denoising

Deformation-Compensated Learning for Image Reconstruction without Ground Truth

1 code implementation12 Jul 2021 Weijie Gan, Yu Sun, Cihat Eldeniz, Jiaming Liu, Hongyu An, Ulugbek S. Kamilov

Deep neural networks for medical image reconstruction are traditionally trained using high-quality ground-truth images as training targets.

Image Reconstruction Object

Learning-based Motion Artifact Removal Networks (LEARN) for Quantitative $R_2^\ast$ Mapping

1 code implementation3 Sep 2021 Xiaojian Xu, Satya V. V. N. Kothapalli, Jiaming Liu, Sayan Kahali, Weijie Gan, Dmitriy A. Yablonskiy, Ulugbek S. Kamilov

LEARN-IMG performs motion correction on mGRE images and relies on the subsequent analysis for the estimation of $R_2^\ast$ maps, while LEARN-BIO directly performs motion- and $B0$-inhomogeneity-corrected $R_2^\ast$ estimation.

SamplingAug: On the Importance of Patch Sampling Augmentation for Single Image Super-Resolution

1 code implementation30 Nov 2021 Shizun Wang, Ming Lu, Kaixin Chen, Jiaming Liu, Xiaoqi Li, Chuang Zhang, Ming Wu

However, existing methods mostly train the DNNs on uniformly sampled LR-HR patch pairs, which makes them fail to fully exploit informative patches within the image.

Data Augmentation Image Super-Resolution

Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network with Graph Representation Learning

no code implementations5 Jan 2022 Xingqun Qi, Muyi Sun, Zijian Wang, Jiaming Liu, Qi Li, Fang Zhao, Shanghang Zhang, Caifeng Shan

To preserve the generated faces being more structure-coordinated, the IRSG models inter-class structural relations among every facial component by graph representation learning.

Generative Adversarial Network Graph Representation Learning +1

Monotonically Convergent Regularization by Denoising

no code implementations10 Feb 2022 Yuyang Hu, Jiaming Liu, Xiaojian Xu, Ulugbek S. Kamilov

Regularization by denoising (RED) is a widely-used framework for solving inverse problems by leveraging image denoisers as image priors.

Compressive Sensing Deblurring +2

Adaptive Patch Exiting for Scalable Single Image Super-Resolution

1 code implementation22 Mar 2022 Shizun Wang, Jiaming Liu, Kaixin Chen, Xiaoqi Li, Ming Lu, Yandong Guo

Once the incremental capacity is below the threshold, the patch can exit at the specific layer.

Image Super-Resolution

Image Reconstruction for MRI using Deep CNN Priors Trained without Groundtruth

no code implementations10 Apr 2022 Weijie Gan, Cihat Eldeniz, Jiaming Liu, Sihao Chen, Hongyu An, Ulugbek S. Kamilov

We propose a new plug-and-play priors (PnP) based MR image reconstruction method that systematically enforces data consistency while also exploiting deep-learning priors.

Image Reconstruction

Few-Shot Head Swapping in the Wild

no code implementations CVPR 2022 Changyong Shu, Hemao Wu, Hang Zhou, Jiaming Liu, Zhibin Hong, Changxing Ding, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Particularly, seamless blending is achieved with the help of a Semantic-Guided Color Reference Creation procedure and a Blending UNet.

Face Swapping

MTTrans: Cross-Domain Object Detection with Mean-Teacher Transformer

1 code implementation3 May 2022 Jinze Yu, Jiaming Liu, Xiaobao Wei, Haoyi Zhou, Yohei Nakata, Denis Gudovskiy, Tomoyuki Okuno, JianXin Li, Kurt Keutzer, Shanghang Zhang

To solve this problem, we propose an end-to-end cross-domain detection Transformer based on the mean teacher framework, MTTrans, which can fully exploit unlabeled target domain data in object detection training and transfer knowledge between domains via pseudo labels.

Domain Adaptation Object +3

Few-Shot Font Generation by Learning Fine-Grained Local Styles

2 code implementations CVPR 2022 Licheng Tang, Yiyang Cai, Jiaming Liu, Zhibin Hong, Mingming Gong, Minhu Fan, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Instead of explicitly disentangling global or component-wise modeling, the cross-attention mechanism can attend to the right local styles in the reference glyphs and aggregate the reference styles into a fine-grained style representation for the given content glyphs.

Font Generation

Online Deep Equilibrium Learning for Regularization by Denoising

1 code implementation25 May 2022 Jiaming Liu, Xiaojian Xu, Weijie Gan, Shirin Shoushtari, Ulugbek S. Kamilov

However, the dependence of the computational/memory complexity of the measurement models in PnP/RED on the total number of measurements leaves DEQ impractical for many imaging applications.

Denoising

Efficient Meta-Tuning for Content-aware Neural Video Delivery

1 code implementation20 Jul 2022 Xiaoqi Li, Jiaming Liu, Shizun Wang, Cheng Lyu, Ming Lu, Yurong Chen, Anbang Yao, Yandong Guo, Shanghang Zhang

Our method significantly reduces the computational cost and achieves even better performance, paving the way for applying neural video delivery techniques to practical applications.

Super-Resolution

Deep Model-Based Architectures for Inverse Problems under Mismatched Priors

no code implementations26 Jul 2022 Shirin Shoushtari, Jiaming Liu, Yuyang Hu, Ulugbek S. Kamilov

While the empirical performance and theoretical properties of DMBAs have been widely investigated, the existing work in the area has primarily focused on their performance when the desired image prior is known exactly.

Uncertainty Guided Depth Fusion for Spike Camera

no code implementations26 Aug 2022 Jianing Li, Jiaming Liu, Xiaobao Wei, Jiyuan Zhang, Ming Lu, Lei Ma, Li Du, Tiejun Huang, Shanghang Zhang

In this paper, we propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse the predictions of monocular and stereo depth estimation networks for spike camera.

Autonomous Driving Stereo Depth Estimation

Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer

1 code implementation26 Aug 2022 Jiaming Liu, Qizhe Zhang, Jianing Li, Ming Lu, Tiejun Huang, Shanghang Zhang

Neuromorphic spike data, an upcoming modality with high temporal resolution, has shown promising potential in real-world applications due to its inherent advantage to overcome high-velocity motion blur.

Autonomous Driving Depth Estimation +2

Dual-Cycle: Self-Supervised Dual-View Fluorescence Microscopy Image Reconstruction using CycleGAN

no code implementations23 Sep 2022 Tomas Kerepecky, Jiaming Liu, Xue Wen Ng, David W. Piston, Ulugbek S. Kamilov

Three-dimensional fluorescence microscopy often suffers from anisotropy, where the resolution along the axial direction is lower than that within the lateral imaging plane.

Image Reconstruction

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

no code implementations27 Sep 2022 Zhiliang Xu, Hang Zhou, Zhibin Hong, Ziwei Liu, Jiaming Liu, Zhizhi Guo, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping, thus the generator's advantage can be adopted for optimizing identity similarity.

Face Swapping

Self-Supervised Deep Equilibrium Models for Inverse Problems with Theoretical Guarantees

no code implementations7 Oct 2022 Weijie Gan, Chunwei Ying, Parna Eshraghi, Tongyao Wang, Cihat Eldeniz, Yuyang Hu, Jiaming Liu, Yasheng Chen, Hongyu An, Ulugbek S. Kamilov

Our numerical results on in-vivo MRI data show that SelfDEQ leads to state-of-the-art performance using only undersampled and noisy training data.

Image Reconstruction

Robustness of Deep Equilibrium Architectures to Changes in the Measurement Model

no code implementations1 Nov 2022 Junhao Hu, Shirin Shoushtari, Zihao Zou, Jiaming Liu, Zhixin Sun, Ulugbek S. Kamilov

Deep model-based architectures (DMBAs) are widely used in imaging inverse problems to integrate physical measurement models and learned image priors.

DOLPH: Diffusion Models for Phase Retrieval

no code implementations1 Nov 2022 Shirin Shoushtari, Jiaming Liu, Ulugbek S. Kamilov

Phase retrieval refers to the problem of recovering an image from the magnitudes of its complex-valued linear measurements.

Retrieval

DOLCE: A Model-Based Probabilistic Diffusion Framework for Limited-Angle CT Reconstruction

no code implementations ICCV 2023 Jiaming Liu, Rushil Anirudh, Jayaraman J. Thiagarajan, Stewart He, K. Aditya Mohan, Ulugbek S. Kamilov, Hyojin Kim

Limited-Angle Computed Tomography (LACT) is a non-destructive evaluation technique used in a variety of applications ranging from security to medicine.

BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object Detection

no code implementations30 Nov 2022 Jiaming Liu, Rongyu Zhang, Xiaoqi Li, Xiaowei Chi, Zehui Chen, Ming Lu, Yandong Guo, Shanghang Zhang

In this paper, we propose a Multi-space Alignment Teacher-Student (MATS) framework to ease the domain shift accumulation, which consists of a Depth-Aware Teacher (DAT) and a Geometric-space Aligned Student (GAS) model.

3D Object Detection Autonomous Driving +4

BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks

no code implementations CVPR 2023 Xiaowei Chi, Jiaming Liu, Ming Lu, Rongyu Zhang, Zhaoqing Wang, Yandong Guo, Shanghang Zhang

In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices.

3D Object Detection Autonomous Driving +1

Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-world

no code implementations CVPR 2023 Yulu Gan, Mingjie Pan, Rongyu Zhang, Zijian Ling, Lingran Zhao, Jiaming Liu, Shanghang Zhang

To enable the device model to deal with changing environments, we propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device and improves the generalization of the device model.

Device-Cloud Collaboration object-detection +2

Deep Equilibrium Learning of Explicit Regularizers for Imaging Inverse Problems

1 code implementation9 Mar 2023 Zihao Zou, Jiaming Liu, Brendt Wohlberg, Ulugbek S. Kamilov

ELDER is based on a regularization functional parameterized by a CNN and a deep equilibrium learning (DEQ) method for training the functional to be MSE-optimal at the fixed points of the reconstruction algorithm.

Image Reconstruction

A Comprehensive Comparison of Projections in Omnidirectional Super-Resolution

no code implementations13 Apr 2023 Huicheng Pi, Senmao Tian, Ming Lu, Jiaming Liu, Yandong Guo, Shunli Zhang

In these works, omnidirectional frames are projected from the 3D sphere to a 2D plane by Equi-Rectangular Projection (ERP).

Super-Resolution

CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input

1 code implementation CVPR 2023 Senmao Tian, Ming Lu, Jiaming Liu, Yandong Guo, Yurong Chen, Shunli Zhang

Therefore, we design a strategy to build an Edge-to-Bit lookup table that maps the edge score of a patch to the bit of each layer during inference.

Image Super-Resolution Quantization

Open-world Semi-supervised Novel Class Discovery

1 code implementation22 May 2023 Jiaming Liu, Yangqiming Wang, Tongze Zhang, Yulu Fan, Qinli Yang, Junming Shao

Traditional semi-supervised learning tasks assume that both labeled and unlabeled data follow the same class distribution, but the realistic open-world scenarios are of more complexity with unknown novel classes mixed in the unlabeled set.

Contrastive Learning Novel Class Discovery +1

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation

1 code implementation7 Jun 2023 Jiaming Liu, Senqiao Yang, Peidong Jia, Renrui Zhang, Ming Lu, Yandong Guo, Wei Xue, Shanghang Zhang

Note that, our method can be regarded as a novel transfer paradigm for large-scale models, delivering promising results in adaptation to continually changing distributions.

Test-time Adaptation

UniOcc: Unifying Vision-Centric 3D Occupancy Prediction with Geometric and Semantic Rendering

no code implementations15 Jun 2023 Mingjie Pan, Li Liu, Jiaming Liu, Peixiang Huang, Longlong Wang, Shanghang Zhang, Shaoqing Xu, Zhiyi Lai, Kuiyuan Yang

In this technical report, we present our solution, named UniOCC, for the Vision-Centric 3D occupancy prediction track in the nuScenes Open Dataset Challenge at CVPR 2023.

Prediction Of Occupancy Grid Maps

DiffuseIR:Diffusion Models For Isotropic Reconstruction of 3D Microscopic Images

no code implementations21 Jun 2023 Mingjie Pan, Yulu Gan, Fangxu Zhou, Jiaming Liu, Aimin Wang, Shanghang Zhang, Dawei Li

Since the diffusion model learns the universal structural distribution of biological tissues, which is independent of the axial resolution, DiffuseIR can reconstruct authentic images with unseen low-axial resolutions into a high-axial resolution without requiring re-training.

Super-Resolution

PM-DETR: Domain Adaptive Prompt Memory for Object Detection with Transformers

no code implementations1 Jul 2023 Peidong Jia, Jiaming Liu, Senqiao Yang, Jiarui Wu, Xiaodong Xie, Shanghang Zhang

PDM comprehensively leverages the prompt memory to extract domain-specific knowledge and explicitly constructs a long-term memory space for the data distribution, which represents better domain diversity compared to existing methods.

object-detection Object Detection

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks

1 code implementation24 Aug 2023 Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Hao Dong, Peng Gao

However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes.

3D Semantic Segmentation Few-shot 3D semantic segmentation +1

MPI-Flow: Learning Realistic Optical Flow with Multiplane Images

1 code implementation ICCV 2023 Yingping Liang, Jiaming Liu, Debing Zhang, Ying Fu

The accuracy of learning-based optical flow estimation models heavily relies on the realism of the training datasets.

Optical Flow Estimation

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision

1 code implementation18 Sep 2023 Mingjie Pan, Jiaming Liu, Renrui Zhang, Peixiang Huang, Xiaoqi Li, Bing Wang, Hongwei Xie, Li Liu, Shanghang Zhang

3D occupancy prediction holds significant promise in the fields of robot perception and autonomous driving, which quantifies 3D scenes into grid cells with semantic labels.

Autonomous Driving

NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything

no code implementations22 Sep 2023 Xiaobao Wei, Renrui Zhang, Jiarui Wu, Jiaming Liu, Ming Lu, Yandong Guo, Shanghang Zhang

Firstly, to separate the target object from the scene, we propose a novel strategy to lift the multi-view 2D segmentation masks of SAM into a unified 3D variation field.

3D Object Reconstruction Object

Distribution-Aware Continual Test Time Adaptation for Semantic Segmentation

no code implementations24 Sep 2023 Jiayi Ni, Senqiao Yang, Jiaming Liu, Xiaoqi Li, Wenyu Jiao, ran Xu, Zehui Chen, Yi Liu, Shanghang Zhang

In this paper, we propose a distribution-aware tuning (DAT) method to make the semantic segmentation CTTA efficient and practical in real-world applications.

Autonomous Driving Semantic Segmentation +1

Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis

no code implementations29 Sep 2023 Shirin Shoushtari, Jiaming Liu, Edward P. Chandler, M. Salman Asif, Ulugbek S. Kamilov

Our second set of numerical results considers a simple and effective domain adaption strategy that closes the performance gap due to the use of mismatched denoisers.

Domain Adaptation Image Super-Resolution

IPDreamer: Appearance-Controllable 3D Object Generation with Image Prompts

1 code implementation9 Oct 2023 Bohan Zeng, Shanglin Li, Yutang Feng, Hong Li, Sicheng Gao, Jiaming Liu, Huaxia Li, Xu Tang, Jianzhuang Liu, Baochang Zhang

Recent advances in 3D generation have been remarkable, with methods such as DreamFusion leveraging large-scale text-to-image diffusion-based models to supervise 3D generation.

Image to 3D Object +1

ImageManip: Image-based Robotic Manipulation with Affordance-guided Next View Selection

no code implementations13 Oct 2023 Xiaoqi Li, Yanzi Wang, Yan Shen, Ponomarenko Iaroslav, Haoran Lu, Qianxu Wang, Boshi An, Jiaming Liu, Hao Dong

This framework is designed to capture multiple perspectives of the target object and infer depth information to complement its geometry.

Object Robot Manipulation

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

no code implementations20 Nov 2023 Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, Shanghang Zhang

(2) During the distillation period, a pixel-wise frequency mask is generated via Frequency Prompt, to localize those pixel of interests (PoIs) in various frequency bands.

Knowledge Distillation

FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration

no code implementations26 Nov 2023 Zihao Zou, Jiaming Liu, Shirin Shoushtari, YuBo Wang, Weijie Gan, Ulugbek S. Kamilov

Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input.

Deblurring Image Enhancement +3

StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences

1 code implementation28 Nov 2023 Shangkun Sun, Jiaming Liu, Thomas H. Li, Huaxia Li, Guoqing Liu, Wei Gao

To address this issue, multi-frame optical flow methods leverage adjacent frames to mitigate the local ambiguity.

Optical Flow Estimation

MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning

1 code implementation5 Dec 2023 Qizhe Zhang, Bocheng Zou, Ruichuan An, Jiaming Liu, Shanghang Zhang

Motivated by this, we propose Mixture of Sparse Adapters, or MoSA, as a novel Adapter Tuning method to fully unleash the potential of each parameter in the adapter.

M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking

2 code implementations11 Dec 2023 Jiaming Liu, Yue Wu, Maoguo Gong, Qiguang Miao, Wenping Ma, Can Qin

3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving.

3D Single Object Tracking Autonomous Driving +1

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

no code implementations19 Dec 2023 Jiaming Liu, ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang

To tackle these issues, we propose a continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts.

Self-Supervised Learning Test-time Adaptation

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

no code implementations21 Dec 2023 Senqiao Yang, Jiaming Liu, Ray Zhang, Mingjie Pan, Zoey Guo, Xiaoqi Li, Zehui Chen, Peng Gao, Yandong Guo, Shanghang Zhang

In this paper, we introduce LiDAR-LLM, which takes raw LiDAR data as input and harnesses the remarkable reasoning capabilities of LLMs to gain a comprehensive understanding of outdoor 3D scenes.

Instruction Following Language Modelling +1

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

no code implementations24 Dec 2023 Xiaoqi Li, Mingxu Zhang, Yiran Geng, Haoran Geng, Yuxing Long, Yan Shen, Renrui Zhang, Jiaming Liu, Hao Dong

By fine-tuning the injected adapters, we preserve the inherent common sense and reasoning ability of the MLLMs while equipping them with the ability for manipulation.

Common Sense Reasoning Language Modelling +4

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

1 code implementation26 Dec 2023 Yuxuan Zhang, Yiren Song, Jiaming Liu, Rui Wang, Jinpeng Yu, Hao Tang, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing

Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations remain challenging.

Image Generation

Cloud-Device Collaborative Learning for Multimodal Large Language Models

no code implementations26 Dec 2023 Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang

However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compressed for device deployment.

Device-Cloud Collaboration Knowledge Distillation +1

ZONE: Zero-Shot Instruction-Guided Local Editing

no code implementations28 Dec 2023 Shanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xuhui Liu, Jiaming Liu, Li Lin, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang

We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model.

Image Generation

A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track

no code implementations27 Feb 2024 Zehui Chen, Qiuchen Wang, Zhenyu Li, Jiaming Liu, Shanghang Zhang, Feng Zhao

In this report, we present our solution to the multi-task robustness track of the 1st Visual Continual Learning (VCL) Challenge at ICCV 2023 Workshop.

3D Object Detection Continual Learning +5

Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model

no code implementations12 Mar 2024 Yuxuan Zhang, Lifu Wei, Qing Zhang, Yiren Song, Jiaming Liu, Huaxia Li, Xu Tang, Yao Hu, Haibo Zhao

Current makeup transfer methods are limited to simple makeup styles, making them difficult to apply in real-world scenarios.

Text-to-Image Generation

Overcoming Distribution Shifts in Plug-and-Play Methods with Test-Time Training

no code implementations15 Mar 2024 Edward P. Chandler, Shirin Shoushtari, Jiaming Liu, M. Salman Asif, Ulugbek S. Kamilov

A common issue with the learned models is that of a performance drop when there is a distribution shift between the training and testing data.

Image Reconstruction

StableGarment: Garment-Centric Generation via Stable Diffusion

no code implementations16 Mar 2024 Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, Peipei Li

In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on.

Denoising Image Generation +1

Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection

no code implementations22 Mar 2024 Hongzhi Gao, Zheng Chen, Zehui Chen, Lin Chen, Jiaming Liu, Shanghang Zhang, Feng Zhao

Training high-accuracy 3D detectors necessitates massive labeled 3D annotations with 7 degree-of-freedom, which is laborious and time-consuming.

3D Object Detection object-detection +2

PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition

1 code implementation26 Mar 2024 Chenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Ericsson, Zhenyu Wang, Jiaming Liu, Elliot J. Crowley

In this paper, we further adapt the selective scanning process of Mamba to the visual domain, enhancing its ability to learn features from two-dimensional images by (i) a continuous 2D scanning process that improves spatial continuity by ensuring adjacency of tokens in the scanning sequence, and (ii) direction-aware updating which enables the model to discern the spatial relations of tokens by encoding directional information.

Image Classification Instance Segmentation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.