Search Results for author: Bin Ren

Found 81 papers, 25 papers with code

HiSin: Efficient High-Resolution Sinogram Inpainting via Resolution-Guided Progressive Inference

no code implementations10 Jun 2025 Jiaze E, Srutarshi Banerjee, Tekin Bicer, Guannan Wang, yanfu Zhang, Bin Ren

High-resolution sinogram inpainting is essential for computed tomography reconstruction, as missing high-frequency projections can lead to visible artifacts and diagnostic errors.

Diagnostic

POLARIS: A High-contrast Polarimetric Imaging Benchmark Dataset for Exoplanetary Disk Representation Learning

1 code implementation4 Jun 2025 Fangyi Cao, Bin Ren, ZiHao Wang, Shiwei Fu, Youbin Mo, Xiaoyang Liu, Yuzhou Chen, Weixin Yao

With over 1, 000, 000 images from more than 10, 000 exposures using state-of-the-art high-contrast imagers (e. g., Gemini Planet Imager, VLT/SPHERE) in the search for exoplanets, can artificial intelligence (AI) serve as a transformative tool in imaging Earth-like exoplanets in the coming decade?

Representation Learning

3D Skeleton-Based Action Recognition: A Review

no code implementations1 Jun 2025 Mengyuan Liu, Hong Liu, Qianshuo Hu, Bin Ren, Junsong Yuan, Jiaying Lin, Jiajun Wen

To bridge this gap, our review aims to address these limitations by presenting a comprehensive, task-oriented framework for understanding skeleton-based action recognition.

Action Recognition Data Augmentation +2

Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

1 code implementation27 May 2025 Davide Lobba, Fulvio Sanguigni, Bin Ren, Marcella Cornia, Rita Cucchiara, Nicu Sebe

While virtual try-on (VTON) systems aim to render a garment onto a target person image, this paper tackles the novel task of virtual try-off (VTOFF), which addresses the inverse problem: generating standardized product images of garments from real-world photos of clothed individuals.

Virtual Try-Off Virtual Try-on

Manifold-aware Representation Learning for Degradation-agnostic Image Restoration

no code implementations24 May 2025 Bin Ren, Yawei Li, Xu Zheng, Yuqian Fu, Danda Pani Paudel, Ming-Hsuan Yang, Luc van Gool, Nicu Sebe

In this work, we present MIRAGE, a unified and lightweight framework for all in one IR that explicitly decomposes the input feature space into three semantically aligned parallel branches, each processed by a specialized module attention for global context, convolution for local textures, and MLP for channel-wise statistics.

Contrastive Learning Image Restoration +1

Adversarial Robustness for Unified Multi-Modal Encoders via Efficient Calibration

no code implementations17 May 2025 Chih-Ting Liao, Bin Ren, Guofeng Mei, Xu Zheng

Experiments on six modalities and three Bind-style models show that our method improves adversarial robustness by up to 47. 3 percent at epsilon = 4/255, while preserving or even improving clean zero-shot and retrieval performance with less than 1 percent trainable parameters.

Adversarial Robustness

Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation

no code implementations19 Apr 2025 Bin Ren, Eduard Zamfir, Zongwei Wu, Yawei Li, Yidi Li, Danda Pani Paudel, Radu Timofte, Ming-Hsuan Yang, Luc van Gool, Nicu Sebe

Restoring any degraded image efficiently via just one model has become increasingly significant and impactful, especially with the proliferation of mobile devices.

Benchmarking Image Restoration

The Tenth NTIRE 2025 Image Denoising Challenge Report

1 code implementation16 Apr 2025 Lei Sun, Hang Guo, Bin Ren, Luc van Gool, Radu Timofte, Yawei Li, Xiangyu Kong, Hyunhee Park, Xiaoxuan Yu, Suejin Han, Hakjae Jeon, Jia Li, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, JingYu Ma, Zhijuan Huang, Huiyuan Fu, Hongyuan Yu, Boqi Zhang, Jiawei Shi, Heng Zhang, Huadong Ma, Deepak Kumar Tyagi, Aman Kukretti, Gajender Sharma, Sriharsha Koundinya, Asim Manna, Jun Cheng, Shan Tan, Jun Liu, Jiangwei Hao, Jianping Luo, Jie Lu, Satya Narayan Tazi, Arnim Gautam, Aditi Pawar, Aishwarya Joshi, Akshay Dudhane, Praful Hambadre, Sachin Chaudhary, Santosh Kumar Vipparthi, Subrahmanyam Murala, Jiachen Tu, Nikhil Akalwadi, Vijayalaxmi Ashok Aralikatti, Dheeraj Damodar Hegde, G Gyaneshwar Rao, Jatin Kalal, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudenagudi, Zhenyuan Lin, Yubo Dong, Weikun Li, Anqi Li, Ang Gao, Weijun Yuan, Zhan Li, Ruting Deng, Yihang Chen, Yifan Deng, Zhanglu Chen, Boyang Yao, Shuling Zheng, Feng Zhang, Zhiheng Fu, Anas M. Ali, Bilel Benjdira, Wadii Boulila, Jan Seny, Pei Zhou, Jianhua Hu, K. L. Eddie Law, Jaeho Lee, M. J. Aashik Rasool, Abdur Rehman, SMA Sharif, Seongwan Kim, Alexandru Brateanu, Raul Balmez, Ciprian Orhei, Cosmin Ancuti, Zeyu Xiao, Zhuoyuan Li, Ziqi Wang, Yanyan Wei, Fei Wang, Kun Li, Shengeng Tang, Yunkai Zhang, Weirun Zhou, Haoxuan Lu

This paper presents an overview of the NTIRE 2025 Image Denoising Challenge ({\sigma} = 50), highlighting the proposed methodologies and corresponding results.

Image Denoising valid

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

3 code implementations14 Apr 2025 Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang, Haibo Lei, Qifang Gao, Yaqing Li, Weihua Luo, Tsing Li, Qing Wang, Yi Liu, Yang Wang, Hongyu An, Liou Zhang, Shijie Zhao, Lianhong Song, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Jing Wei, Mengyang Wang, Ruilong Guo, Qian Wang, Qingliang Liu, Yang Cheng, Davinci, Enxuan Gu, Pinxin Liu, Yongsheng Yu, Hang Hua, Yunlong Tang, Shihao Wang, ZhiYu Zhang, Yukun Yang, Jiyu Wu, Jiancheng Huang, Yifan Liu, Yi Huang, Shifeng Chen, Rui Chen, Yi Feng, Mingxi Li, Cailu Wan, XiangJi Wu, Zibin Liu, Jinyang Zhong, Kihwan Yoon, Ganzorig Gankhuyag, Shengyun Zhong, Mingyang Wu, Renjie Li, Yushen Zuo, Zhengzhong Tu, Zongang Gao, Guannan Chen, Yuan Tian, Wenhui Chen, Weijun Yuan, Zhan Li, Yihang Chen, Yifan Deng, Ruting Deng, Yilin Zhang, Huan Zheng, Yanyan Wei, Wenxuan Zhao, Suiyi Zhao, Fei Wang, Kun Li, Yinggan Tang, Mengjie Su, Jae-Hyeon Lee, Dong-Hyeop Son, Ui-Jin Choi, Tiancheng Shao, Yuqing Zhang, Mengcheng Ma, Donggeun Ko, Youngsang Kwak, Jiun Lee, Jaehwa Kwak, YuXuan Jiang, Qiang Zhu, Siyue Teng, Fan Zhang, Shuyuan Zhu, Bing Zeng, David Bull, Jing Hu, Hui Deng, Xuan Zhang, Lin Zhu, Qinrui Fan, Weijian Deng, Junnan Wu, Wenqin Deng, Yuquan Liu, Zhaohong Xu, Jameer Babu Pinjari, Kuldeep Purohit, Zeyu Xiao, Zhuoyuan Li, Surya Vashisth, Akshay Dudhane, Praful Hambarde, Sachin Chaudhary, Satya Naryan Tazi, Prashant Patil, Santosh Kumar Vipparthi, Subrahmanyam Murala, Wei-Chen Shen, I-Hsiang Chen, Yunzhe Xu, Chen Zhao, Zhizhou Chen, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Alejandro Merino, Bruno Longarela, Javier Abad, Marcos V. Conde, Simone Bianco, Luca Cogo, Gianmarco Corti

This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR).

Super-Resolution valid

Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook

1 code implementation23 Mar 2025 Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Lutao Jiang, Haiwei Xue, Bin Ren, Danda Paudel, Nicu Sebe, Luc van Gool, Xuming Hu

Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI), particularly in enhancing the capabilities of large language models (LLMs) by enabling access to external, reliable, and up-to-date knowledge sources.

3D Generation Medical Report Generation +4

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

1 code implementation23 Mar 2025 Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc van Gool, Martin R. Oswald, Danda Pani Paudel

In order to power the proposed methods, we introduce SceneSplat-7K, the first large-scale 3DGS dataset for indoor scenes, comprising of 6868 scenes derived from 7 established datasets like ScanNet, Matterport3D, etc.

3DGS Benchmarking +2

Fractal-IR: A Unified Framework for Efficient and Scalable Image Restoration

no code implementations22 Mar 2025 Yawei Li, Bin Ren, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Nicu Sebe, Ming-Hsuan Yang, Luca Benini

While vision transformers achieve significant breakthroughs in various image restoration (IR) tasks, it is still challenging to efficiently scale them across multiple types of degradations and resolutions.

Deblurring Demosaicking +4

ROLO-SLAM: Rotation-Optimized LiDAR-Only SLAM in Uneven Terrain with Ground Vehicle

1 code implementation4 Jan 2025 Yinchuan Wang, Bin Ren, Xiang Zhang, Pengyu Wang, Chaoqun Wang, Rui Song, Yibin Li, Max Q. -H. Meng

In this article, a LiDAR-based SLAM method is presented to improve the accuracy of pose estimations for ground vehicles in rough terrains, which is termed Rotation-Optimized LiDAR-Only (ROLO) SLAM.

Pose Estimation

Hierarchical Information Flow for Generalized Efficient Image Restoration

no code implementations27 Nov 2024 Yawei Li, Bin Ren, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Nicu Sebe, Ming-Hsuan Yang, Luca Benini

To strike a balance between efficiency and model capacity for a generalized transformer-based IR method, we propose a hierarchical information flow mechanism for image restoration, dubbed Hi-IR, which progressively propagates information among pixels in a bottom-up manner.

Color Image Denoising Grayscale Image Denoising +5

Hierarchical Cross-Attention Network for Virtual Try-On

no code implementations23 Nov 2024 Hao Tang, Bin Ren, Pingping Wu, Nicu Sebe

In this paper, we present an innovative solution for the challenges of the virtual try-on task: our novel Hierarchical Cross-Attention Network (HCANet).

Geometric Matching Virtual Try-on

FCDM: A Physics-Guided Bidirectional Frequency Aware Convolution and Diffusion-Based Model for Sinogram Inpainting

no code implementations26 Aug 2024 Jiaze E, Srutarshi Banerjee, Tekin Bicer, Guannan Wang, yanfu Zhang, Bin Ren

Computed tomography (CT) is widely used in industrial and medical imaging, but sparse-view scanning reduces radiation exposure at the cost of incomplete sinograms and challenging reconstruction.

Computed Tomography (CT) CT Reconstruction +2

Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

no code implementations26 Aug 2024 Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

By transferring knowledge from teacher to student, the student network can better adapt to complex dynamic scenes with incomplete observations.

Generative Adversarial Network

ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining

no code implementations20 Aug 2024 Qi Ma, Yue Li, Bin Ren, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc van Gool, Danda Pani Paudel

In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.

3DGS Representation Learning

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

1 code implementation8 Jul 2024 Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li, Mengyuan Liu, Rita Cucchiara, Luc van Gool, Nicu Sebe

To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance.

Contrastive Learning Data Augmentation +2

Sharing Key Semantics in Transformer Makes Efficient Image Restoration

1 code implementation30 May 2024 Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc van Gool, Ming-Hsuan Yang, Nicu Sebe

Additionally, for IR, it is commonly noted that small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process, as they contribute essential contextual cues crucial for accurate reconstruction.

Image Restoration

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

no code implementations21 Apr 2024 Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng Shen, Miao Yin, Gagan Agrawal, Bin Ren

Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e. g., Stable Diffusion and LLMs) based on transformers, we observe that layout transformations between the computational operators cause a significant slowdown in these applications.

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

3 code implementations16 Apr 2024 Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi

In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.

Image Super-Resolution

SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

no code implementations29 Feb 2024 Wei Niu, Gagan Agrawal, Bin Ren

Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs.

Code Generation

Key-Graph Transformer for Image Restoration

no code implementations4 Feb 2024 Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc van Gool, Nicu Sebe

While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution.

Graph Attention Image Restoration

Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation

no code implementations4 Feb 2024 Ti Wang, Mengyuan Liu, Hong Liu, Bin Ren, Yingxuan You, Wenhao Li, Nicu Sebe, Xia Li

We observe that previous optimization-based methods commonly rely on projection constraint, which only ensures alignment in 2D space, potentially leading to the overfitting problem.

3D Human Pose Estimation

Revisiting Recommendation Loss Functions through Contrastive Learning (Technical Report)

no code implementations13 Dec 2023 Dong Li, Ruoming Jin, Bin Ren

Inspired by the success of contrastive learning, we systematically examine recommendation losses, including listwise (softmax), pairwise (BPR), and pointwise (MSE and CCL) losses.

Contrastive Learning

ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models

no code implementations5 Dec 2023 Weijie Wang, Wenqi Ren, Guofeng Mei, Bin Ren, Xiaoshui Huang, Fabio Poiesi, Nicu Sebe, Bruno Lepri

To address this, we construct scene graphs to capture spatial relationships among objects and apply a graph matching algorithm to these graphs to accurately identify matched objects.

Decoder Graph Matching +3

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

2 code implementations CVPR 2023 Gen Li, Jie Ji, Minghai Qin, Wei Niu, Bin Ren, Fatemeh Afghah, Linke Guo, Xiaolong Ma

To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum.

Video Super-Resolution

Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models

no code implementations10 Jan 2023 Mengyi Zhao, Mengyuan Liu, Bin Ren, Shuling Dai, Nicu Sebe

Diffusion-based generative models have recently emerged as powerful solutions for high-quality synthesis in multiple domains.

Denoising Motion Generation

Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge

no code implementations CVPR 2023 Changdi Yang, Pu Zhao, Yanyu Li, Wei Niu, Jiexiong Guan, Hao Tang, Minghai Qin, Bin Ren, Xue Lin, Yanzhi Wang

With the ever-increasing popularity of edge devices, it is necessary to implement real-time segmentation on the edge for autonomous driving and many other applications.

Autonomous Driving Segmentation +1

Towards Reliable Item Sampling for Recommendation Evaluation

no code implementations28 Nov 2022 Dong Li, Ruoming Jin, Zhenming Liu, Bin Ren, Jing Gao, Zhi Liu

Since Rendle and Krichene argued that commonly used sampling-based evaluation metrics are "inconsistent" with respect to the global metrics (even in expectation), there have been a few studies on the sampling-based recommender system evaluation.

Recommendation Systems

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

no code implementations12 Nov 2022 Hao Tang, Lei Ding, Songsong Wu, Bin Ren, Nicu Sebe, Paolo Rota

The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.

Classification Video Classification

SparCL: Sparse Continual Learning on the Edge

1 code implementation20 Sep 2022 Zifeng Wang, Zheng Zhan, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy

SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.

Continual Learning

Survey: Exploiting Data Redundancy for Optimization of Deep Learning

no code implementations29 Aug 2022 Jou-An Chen, Wei Niu, Bin Ren, Yanzhi Wang, Xipeng Shen

It surveys hundreds of recent papers on the topic, introduces a novel taxonomy to put the various techniques into a single categorization framework, offers a comprehensive description of the main methods used for exploiting data redundancy in improving multiple kinds of DNNs on data, and points out a set of research opportunities for future to explore.

Deep Learning Survey

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

1 code implementation25 Jul 2022 Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang

Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence.

Neural Architecture Search SSIM +1

PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image Translation

1 code implementation9 Jul 2022 Bin Ren, Hao Tang, Yiming Wang, Xia Li, Wei Wang, Nicu Sebe

For semantic-guided cross-view image translation, it is crucial to learn where to sample pixels from the source view image and where to reallocate them guided by the target view semantic map, especially when there is little overlap or drastic view difference between the source and target images.

Generative Adversarial Network

CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework

no code implementations21 Jun 2022 Xiaofeng Li, Bin Ren, Xipeng Shen, Yanzhi Wang

There is a growing demand for shifting the delivery of AI capability from data centers on the cloud to edge or end devices, exemplified by the fast emerging real-time AI-based apps running on smartphones, AR/VR devices, autonomous vehicles, and various IoT devices.

Autonomous Vehicles

Real-Time Portrait Stylization on the Edge

no code implementations2 Jun 2022 Yanyu Li, Xuan Shen, Geng Yuan, Jiexiong Guan, Wei Niu, Hao Tang, Bin Ren, Yanzhi Wang

In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices.

Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers

1 code implementation CVPR 2023 Bin Ren, Yahui Liu, Yue Song, Wei Bi, Rita Cucchiara, Nicu Sebe, Wei Wang

In particular, MJP first shuffles the selected patches via our block-wise random jigsaw puzzle shuffle algorithm, and their corresponding PEs are occluded.

Federated Learning Position

SPViT: Enabling Faster Vision Transformers via Soft Token Pruning

1 code implementation27 Dec 2021 Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Xuan Shen, Geng Yuan, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

Moreover, our framework can guarantee the identified model to meet resource specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile platforms.

Efficient ViTs image-classification +1

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

no code implementations22 Nov 2021 Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices.

Model Compression

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

1 code implementation NeurIPS 2021 Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin

Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works.

Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation

1 code implementation19 Oct 2021 Bin Ren, Hao Tang, Nicu Sebe

To ease this problem, we propose a novel two-stage framework with a new Cascaded Cross MLP-Mixer (CrossMLP) sub-network in the first stage and one refined pixel-level loss in the second stage.

Decoder Translation

HFSP: A Hardware-friendly Soft Pruning Framework for Vision Transformers

no code implementations29 Sep 2021 Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult.

image-classification Image Classification +1

On the regularization landscape for the linear recommendation models

no code implementations29 Sep 2021 Dong Li, Zhenming Liu, Ruoming Jin, Zhi Liu, Jing Gao, Bin Ren

Recently, a wide range of recommendation algorithms inspired by deep learning techniques have emerged as the performance leaders several standard recommendation benchmarks.

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

no code implementations30 Aug 2021 Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, Bin Ren

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices.

Code Generation

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

no code implementations25 Aug 2021 Wei Niu, Zhengang Li, Xiaolong Ma, Peiyan Dong, Gang Zhou, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

It necessitates the sparse model inference via weight pruning, i. e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy.

Code Generation Compiler Optimization

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

no code implementations ICCV 2021 Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang

Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices.

Image Super-Resolution Neural Architecture Search +1

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

no code implementations6 Jun 2021 Xuan Shen, Geng Yuan, Wei Niu, Xiaolong Ma, Jiexiong Guan, Zhengang Li, Bin Ren, Yanzhi Wang

The rapid development of autonomous driving, abnormal behavior detection, and behavior recognition makes an increasing demand for multi-person pose estimation-based applications, especially on mobile platforms.

Autonomous Driving Multi-Person Pose Estimation

A Compression-Compilation Framework for On-mobile Real-time BERT Applications

no code implementations30 May 2021 Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

In this paper, we propose a compression-compilation co-design framework that can guarantee the identified model to meet both resource and real-time specifications of mobile devices.

Question Answering Text Generation

Cloth Interactive Transformer for Virtual Try-On

1 code implementation12 Apr 2021 Bin Ren, Hao Tang, Fanyang Meng, Runwei Ding, Philip H. S. Torr, Nicu Sebe

In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.

Virtual Try-on

ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning

no code implementations20 Nov 2020 Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao

Moreover, compared with the state-of-the-art pruning-during-training approach, ClickTrain provides significant improvements both accuracy and compression ratio on the tested CNN models and datasets, under similar limited training time.

On Efficient Constructions of Checkpoints

no code implementations ICML 2020 Yu Chen, Zhenming Liu, Bin Ren, Xin Jin

Efficient construction of checkpoints/snapshots is a critical tool for training and diagnosing deep learning models.

Quantization

Real-Time Execution of Large-scale Language Models on Mobile

no code implementations15 Sep 2020 Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices, thus achieving real-time execution of large transformer-based models like BERT variants.

Edge-computing

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

3 code implementations12 Sep 2020 Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang

In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design.

Computational Efficiency Object +2

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

no code implementations20 Jul 2020 Wei Niu, Mengshu Sun, Zhengang Li, Jou-An Chen, Jiexiong Guan, Xipeng Shen, Yanzhi Wang, Sijia Liu, Xue Lin, Bin Ren

The vanilla sparsity removes whole kernel groups, while KGS sparsity is a more fine-grained structured sparsity that enjoys higher flexibility while exploiting full on-device parallelism.

Code Generation Model Compression

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

no code implementations22 Apr 2020 Wei Niu, Pu Zhao, Zheng Zhan, Xue Lin, Yanzhi Wang, Bin Ren

High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications.

Compiler Optimization Style Transfer +1

CoCoPIE: Making Mobile AI Sweet As PIE --Compression-Compilation Co-Design Goes a Long Way

no code implementations14 Mar 2020 Shaoshan Liu, Bin Ren, Xipeng Shen, Yanzhi Wang

Assuming hardware is the major constraint for enabling real-time mobile intelligence, the industry has mainly dedicated their efforts to developing specialized hardware accelerators for machine learning and inference.

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

no code implementations13 Mar 2020 Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiao-Lin Xu, Yanzhi Wang

Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.

Model Compression Privacy Preserving

A Survey on 3D Skeleton-Based Action Recognition Using Learning Method

no code implementations14 Feb 2020 Bin Ren, Mengyuan Liu, Runwei Ding, Hong Liu

To the best of our knowledge, this research represents the first comprehensive discussion of deep learning-based action recognition using 3D skeleton data.

Action Recognition Deep Learning +1

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

no code implementations ECCV 2020 Xiaolong Ma, Wei Niu, Tianyun Zhang, Sijia Liu, Sheng Lin, Hongjia Li, Xiang Chen, Jian Tang, Kaisheng Ma, Bin Ren, Yanzhi Wang

Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms.

Code Generation Compiler Optimization

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

no code implementations1 Jan 2020 Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss.

Code Generation Model Compression

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

no code implementations6 Sep 2019 Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, Yanzhi Wang

Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method.

Model Compression

An Exo-Kuiper Belt and An Extended Halo around HD 191089 in Scattered Light

1 code implementation31 Jul 2019 Bin Ren, Élodie Choquet, Marshall D. Perrin, Gaspard Duchêne, John H. Debes, Laurent Pueyo, Malena Rice, Christine Chen, Glenn Schneider, Thomas M. Esposito, Charles A. Poteet, Jason J. Wang, S. Mark Ammons, Megan Ansdell, Pauline Arriaga, Vanessa P. Bailey, Travis Barman, Juan Sebastián Bruzzone, Joanna Bulger, Jeffrey Chilcote, Tara Cotten, Robert J. De Rosa, Rene Doyon, Michael P. Fitzgerald, Katherine B. Follette, Stephen J. Goodsell, Benjamin L. Gerard, James R. Graham, Alexandra Z. Greenbaum, J. Brendan Hagan, Pascale Hibon, Dean C. Hines, Li-Wei Hung, Patrick Ingraham, Paul Kalas, Quinn Konopacky, James E. Larkin, Bruce Macintosh, Jérôme Maire, Franck Marchis, Christian Marois, Johan Mazoyer, François Ménard, Stanimir Metchev, Maxwell A. Millar-Blanchaer, Tushar Mittal, Magaret Moerchen, Eric L. Nielsen, Mamadou N'Diaye, Rebecca Oppenheimer, David Palmer, Jennifer Patience, Christophe Pinte, Lisa Poyneer, Abhijith Rajan, Julien Rameau, Fredrik T. Rantakyrö, Jean-Baptiste Ruffio, Dominic Ryan, Dmitry Savransky, Adam C. Schneider, Anand Sivaramakrishnan, Inseok Song, Rémi Soummer, Christopher Stark, Sandrine Thomas, Arthur Vigan, J. Kent Wallace, Kimberly Ward-Duong, Sloane Wiktorowicz, Schuyler Wolff, Marie Ygouf, Colin Norman

We have obtained Hubble Space Telescope STIS and NICMOS, and Gemini/GPI scattered light images of the HD 191089 debris disk.

Earth and Planetary Astrophysics Solar and Stellar Astrophysics

26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone

no code implementations2 May 2019 Wei Niu, Xiaolong Ma, Yanzhi Wang, Bin Ren

With the rapid emergence of a spectrum of high-end mobile devices, many applications that required desktop-level computation capability formerly can now run on these devices without any problem.

All Model Compression

Cannot find the paper you are looking for? You can Submit a new open access paper.