Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input Recognition

Guangming Zhu, Siyuan Wang, Qing Cheng, Kelong Wu, Hao Li, Liang Zhang

With the recent surge in the use of touchscreen devices, free-hand sketching has emerged as a promising modality for human-computer interaction.

Class Incremental Learning Domain Adaptation

Novel OCT mosaicking pipeline with Feature- and Pixel-based registration

Jiacheng Wang, Hao Li, Dewei Hu, Yuankai K. Tao, Ipek Oguz

High-resolution Optical Coherence Tomography (OCT) images are crucial for ophthalmology studies but are limited by their relatively narrow field of view (FoV).

SpectralGPT: Spectral Foundation Model

Danfeng Hong, Bing Zhang, Xuyang Li, YuXuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Gamba Paolo, Jon Atli Benediktsson, Jocelyn Chanussot

The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner.

Change Detection Representation Learning

InfMLLM: A Unified Framework for Visual-Language Tasks

Qiang Zhou, Zhibin Wang, Wei Chu, Yinghui Xu, Hao Li, Yuan Qi

Our experiments demonstrate that preserving the positional information of visual embeddings through the pool-adapter is particularly beneficial for tasks like visual grounding.

Image Captioning Instruction Following

Machine Learning Parameterization of the Multi-scale Kain-Fritsch (MSKF) Convection Scheme

Xiaohui Zhong, Xing Yu, Hao Li

The Weather Research and Forecast (WRF) model is used to generate training and testing data over South China at a horizontal resolution of 5 km.

Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained Image Foundation Models

Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

To address prevalent issues in medical imaging, such as data acquisition challenges and label availability, transfer learning from natural to medical image domains serves as a viable strategy to produce reliable segmentation results.

Image Segmentation Medical Image Segmentation

FuXi-Extreme: Improving extreme rainfall and wind forecasts with diffusion model

Xiaohui Zhong, Lei Chen, Jun Liu, Chensen Lin, Yuan Qi, Hao Li

State-of-the-art ML-based weather forecast models, such as FuXi, have demonstrated superior statistical forecast performance in comparison to the high-resolution forecasts (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF).

Denoising Weather Forecasting

Unpaired MRI Super Resolution with Self-Supervised Contrastive Learning

Hao Li, Quanwei Liu, Jianan Liu, Xiling Liu, Yanni Dong, Tao Huang, Zhihan Lv

High-resolution (HR) magnetic resonance imaging (MRI) is crucial for enhancing diagnostic accuracy in clinical settings.

Contrastive Learning Image Super-Resolution

On Generative Agents in Recommendation

An Zhang, Leheng Sheng, Yuxin Chen, Hao Li, Yang Deng, Xiang Wang, Tat-Seng Chua

Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development.

Collaborative Filtering Movie Recommendation

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang

To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models.

Speech Synthesis Voice Cloning

Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks

Danfeng Hong, Bing Zhang, Hao Li, YuXuan Li, Jing Yao, Chenyu Li, Martin Werner, Jocelyn Chanussot, Alexander Zipf, Xiao Xiang Zhu

Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e. g., single cities or regions).

Domain Adaptation Segmentation

DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning

Xiao-Yin Liu, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Zeng-Guang Hou

However, uncertainty estimation is unreliable and leads to poor performance in certain scenarios, and the previous methods ignore differences between the model data, which brings great conservatism.

D4RL Model-based Reinforcement Learning

Implicit Neural Representation for MRI Parallel Imaging Reconstruction

Hao Li, Yusheng Zhou, Jianan Liu, Xiling Liu, Tao Huang, Zhihan Lv

In this paper, we propose a novel MRI PI reconstruction method based on INR, which represents the reconstructed fully-sampled images as the function of voxel coordinates and prior feature vectors of undersampled images to overcome the generalization problem of INR.

MRI Reconstruction

Research on Damage Analysis of Key Parts of UAV Flight Control System

Tianshun Li, Huaimin Chen, Ben Xiao, Hao Li, Shiyu Hao, Di Hai, Xuetong Wang

A set of hardware in the loop simulation methods based on the UAV model is proposed to create fault data, which is used to judge the parts where faults happen.

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.

Audio Classification Automatic Speech Recognition

Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

Hejun Xiao, Kunyu Peng, Xiangsheng Huang, Alina Roitberg1, Hao Li, Zhaohui Wang, Rainer Stiefelhagen

In this paper, we introduce a privacy-supporting solution that makes the RGB-trained model applicable in depth domain and utilizes depth data at test time for fall detection.

Domain Adaptation

False Negative/Positive Control for SAM on Noisy Medical Images

Xing Yao, Han Liu, Dewei Hu, Daiwei Lu, Ange Lou, Hao Li, Ruining Deng, Gabriel Arenas, Baris Oguz, Nadav Schwartz, Brett C Byram, Ipek Oguz

The method couples multi-box prompt augmentation and an aleatoric uncertainty-based false-negative (FN) and false-positive (FP) correction (FNPC) strategy.

Image Segmentation Medical Image Segmentation

MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

Junkai Xu, Liang Peng, Haoran Cheng, Hao Li, Wei Qian, Ke Li, Wenxiao Wang, Deng Cai

To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception.

Monocular 3D Object Detection object-detection

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

Siyuan Li, Hao Li, Jin Zhang, Zhen Wang, Peng Liu, Chongjie Zhang

Humans have the ability to reuse previously learned policies to solve new tasks quickly, and reinforcement learning (RL) agents can do the same by transferring knowledge from source policies to a related target task.

Continual Learning Reinforcement Learning (RL)

CATS v2: Hybrid encoders for robust medical segmentation

Hao Li, Han Liu, Dewei Hu, Xing Yao, Jiacheng Wang, Ipek Oguz

In our previous work, we proposed CATS, which is a U-shaped segmentation network augmented with transformer encoder.

Domain Adaptation Image Segmentation

XMem++: Production-level Video Segmentation From Few Annotated Frames

Maksym Bekuzarov, Ariana Bermudez, Joon-Young Lee, Hao Li

Despite advancements in user-guided video segmentation, extracting complex objects consistently for highly complex scenes is still a labor-intensive task, especially for production.

Segmentation Semantic Segmentation

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing methods suffer from three problems: the high dimensionality and waveform distortion of discrete speech representations, the prosodic averaging problem caused by the duration prediction model in non-autoregressive frameworks, and the information redundancy and dimension explosion problems of existing semantic encoding methods.

Language Modelling Speech Synthesis

COLosSAL: A Benchmark for Cold-start Active Learning for 3D Medical Image Segmentation

Han Liu, Hao Li, Xing Yao, Yubo Fan, Dewei Hu, Benoit Dawant, Vishwesh Nath, Zhoubing Xu, Ipek Oguz

Cold-start AL is highly relevant in many practical scenarios but has been under-explored, especially for 3D medical segmentation tasks requiring substantial annotation effort.

Active Learning Image Segmentation

Semi-supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation

Hao Li, Zhendong Yuan, Gabriel Dax, Gefei Kong, Hongchao Fan, Alexander Zipf, Martin Werner

In this work, we propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OSM data to generate low-cost and open-source 3D city modeling in LoD1.

object-detection Object Detection

FuXi: A cascade machine learning forecasting system for 15-day global weather forecast

Lei Chen, Xiaohui Zhong, Feng Zhang, Yuan Cheng, Yinghui Xu, Yuan Qi, Hao Li

Over the past few years, due to the rapid development of machine learning (ML) models for weather forecasting, state-of-the-art ML models have shown superior performance compared to the European Centre for Medium-Range Weather Forecasts (ECMWF)'s high-resolution forecast (HRES) in 10-day forecasts at a spatial resolution of 0. 25 degree.

Weather Forecasting

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

Ruohan Gao, Yiming Dou, Hao Li, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, Jiajun Wu

We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch.

Benchmarking Object Recognition

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun Wu

We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear.

Multi-Task Learning Visual Navigation

OVO: Open-Vocabulary Occupancy

Zhiyu Tan, ZiChao Dong, Cheng Zhang, Weikun Zhang, Hang Ji, Hao Li

Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment.

Knowledge Distillation

Do You Hear The People Sing? Key Point Analysis via Iterative Clustering and Abstractive Summarisation

Hao Li, Viktor Schlegel, Riza Batista-Navarro, Goran Nenadic

Furthermore, evaluating key points is crucial in ensuring that the automatically generated summaries are useful.

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen

In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings.

Retrieval Video Retrieval

TG-VQA: Ternary Game of Video Question Answering

Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen

Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them.

Contrastive Learning Question Answering

Correcting for Interference in Experiments: A Case Study at Douyin

Vivek F. Farias, Hao Li, Tianyi Peng, Xinyuyang Ren, Huawei Zhang, Andrew Zheng

We formalize the problem of inference in such experiments as one of policy evaluation.

COSST: Multi-organ Segmentation with Partially Labeled Datasets Using Comprehensive Supervisions and Self-training

Han Liu, Zhoubing Xu, Riqiang Gao, Hao Li, Jianing Wang, Guillaume Chabin, Ipek Oguz, Sasa Grbic

We revisit the problem from a perspective of partial label supervision signals and identify two signals derived from ground truth and one from pseudo labels.

Organ Segmentation Outlier Detection

CryoFormer: Continuous Heterogeneous Cryo-EM Reconstruction using Transformer-based Neural Representations

Xinhang Liu, Yan Zeng, Yifan Qin, Hao Li, Jiakai Zhang, Lan Xu, Jingyi Yu

Cryo-electron microscopy (cryo-EM) allows for the high-resolution reconstruction of 3D structures of proteins and other biomolecules.

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Hansheng Chen, Wei Tian, Pichao Wang, Fan Wang, Lu Xiong, Hao Li

In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose with differentiable probability density on the SE(3) manifold.

3D Object Detection 6D Pose Estimation using RGB

Learning A Sparse Transformer Network for Effective Image Deraining

Xiang Chen, Hao Li, Mingqiang Li, Jinshan Pan

To overcome this problem, we propose an effective DeRaining network, Sparse Transformer (DRSformer) that can adaptively keep the most useful self-attention values for feature aggregation so that the aggregated features better facilitate high-quality image reconstruction.

Image Reconstruction Image Restoration

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen

Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i. e., p(candidates|query).

Retrieval Video Retrieval

Video Action Recognition with Attentive Semantic Units

Yifei Chen, Dapeng Chen, Ruijin Liu, Hao Li, Wei Peng

Supervised by the semantics of action labels, recent works adapt the visual branch of VLMs to learn video representations.

Action Recognition Temporal Action Localization

TwERC: High Performance Ensembled Candidate Generation for Ads Recommendation at Twitter

Vanessa Cai, Pradeep Prabakar, Manuel Serrano Rebuelta, Lucas Rosen, Federico Monti, Katarzyna Janocha, Tomo Lazovich, Jeetu Raj, Yedendra Shrinivasan, Hao Li, Thomas Markovich

We focus on the candidate generation phase of a large-scale ads recommendation problem in this paper, and present a machine learning first heterogeneous re-architecture of this stage which we term TwERC.

Recommendation Systems Vocal Bursts Intensity Prediction

An Adaptive Plug-and-Play Network for Few-Shot Learning

Hao Li, Li Li, Yunmeng Huang, Ning li, Yongtao Zhang

Few-shot learning (FSL) requires a model to classify new samples after learning from only a few samples.

Few-Shot Learning

Boosting Low-Data Instance Segmentation by Unsupervised Pre-training with Saliency Prompt

Hao Li, Dingwen Zhang, Nian Liu, Lechao Cheng, Yalun Dai, Chao Zhang, Xinggang Wang, Junwei Han

Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models by giving Saliency Prompt for queries/kernels.

Instance Segmentation Semantic Segmentation

UNAEN: Unsupervised Abnormality Extraction Network for MRI Motion Artifact Reduction

Yusheng Zhou, Hao Li, Jianan Liu, Zhengmin Kong, Tao Huang, Euijoon Ahn, Zhihan Lv, Jinman Kim, David Dagan Feng

Our results substantiate the potential of UNAEN as a promising solution applicable in real-world clinical environments, with the capability to enhance diagnostic accuracy and facilitate image-guided therapies.

OccluMix: Towards De-Occlusion Virtual Try-on by Semantically-Guided Mixup

Zhijing Yang, Junyang Chen, Yukai Shi, Hao Li, Tianshui Chen, Liang Lin

Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities.

Semantic Parsing Virtual Try-on

StyleGene: Crossover and Mutation of Region-Level Facial Genes for Kinship Face Synthesis

Hao Li, Xianxu Hou, Zepeng Huang, Linlin Shen

As cycle-like losses are designed to measure the L_2 distances between the output of Gene Decoder and image encoder, and that between the output of LGE and IGE, only face images are required to train our framework, i. e. no paired kinship face data is required.

Kinship face generation

Guided Recommendation for Model Fine-Tuning

Hao Li, Charless Fowlkes, Hao Yang, Onkar Dabeer, Zhuowen Tu, Stefano Soatto

With thousands of historical training jobs, a recommendation system can be learned to predict the model selection score given the features of the dataset and the model as input.

Model Selection Transfer Learning

Clusterformer: Cluster-based Transformer for 3D Object Detection in Point Clouds

Yu Pei, Xian Zhao, Hao Li, Jingyuan Ma, Jingwei Zhang, ShiLiang Pu

Attributed to the unstructured and sparse nature of point clouds, the transformer shows greater potential in point clouds data processing.

3D Object Detection object-detection

Biomedical image analysis competitions: The state of current participation practice

SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory

Sicheng Li, Hao Li, Yue Wang, Yiyi Liao, Lu Yu

Neural Radiance Fields (NeRF) have demonstrated superior novel view synthesis performance but are slow at rendering.

Novel View Synthesis

Entropy-Driven Mixed-Precision Quantization for Deep Network Design

Zhenhong Sun, Ce Ge, Junyan Wang, Ming Lin, Hesen Chen, Hao Li, Xiuyu Sun

Deploying deep convolutional neural networks on Internet-of-Things (IoT) devices is challenging due to the limited computational resources, such as limited SRAM memory and Flash storage.

Face Detection Hardware Aware Neural Architecture Search

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai

In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance.

Language Modelling Multi-Task Learning

Bayesian Layer Graph Convolutioanl Network for Hyperspetral Image Classification

Mingyang Zhang, Ziqi Di, Maoguo Gong, Yue Wu, Hao Li, Xiangming Jiang

In recent years, research on hyperspectral image (HSI) classification has continuous progress on introducing deep network models, and recently the graph convolutional network (GCN) based models have shown impressive performance.

Classification Image Classification

Detecting Line Segments in Motion-blurred Images with Events

Huai Yu, Hao Li, Wen Yang, Lei Yu, Gui-Song Xia

To robustly detect line segments over motion blurs, we propose to leverage the complementary information of images and events.

3D Reconstruction Line Segment Detection

VTC-LFC: Vision Transformer Compression with Low-Frequency Components

Zhenyu Wang, Hao Luo, Pichao Wang, Feng Ding, Fan Wang, Hao Li

Although Vision transformers (ViTs) have recently dominated many vision tasks, deploying ViT models on resource-limited devices remains a challenging problem.

Towards Consistency and Complementarity: A Multiview Graph Information Bottleneck Approach

Xiaolong Fan, Maoguo Gong, Yue Wu, Mingyang Zhang, Hao Li, Xiangming Jiang

In this paper, we propose a novel Multiview Variational Graph Information Bottleneck (MVGIB) principle to maximize the agreement for common representations and the disagreement for view-specific representations.

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering

Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen

Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.

Image Captioning Optical Character Recognition (OCR)

MimCo: Masked Image Modeling Pre-training with Contrastive Teacher

Qiang Zhou, Chaohui Yu, Hao Luo, Zhibin Wang, Hao Li

Specifically, MimCo takes a pre-trained contrastive learning model as the teacher model and is pre-trained with two types of learning targets: patch-level and image-level reconstruction losses.

Contrastive Learning Self-Supervised Learning

Cats: Complementary CNN and Transformer Encoders for Segmentation

Hao Li, Dewei Hu, Han Liu, Jiacheng Wang, Ipek Oguz

We fuse the information from the convolutional encoder and the transformer, and pass it to the decoder to obtain the results.

3D Medical Imaging Segmentation Image Segmentation

SBPF: Sensitiveness Based Pruning Framework For Convolutional Neural Network On Image Classification

Yiheng Lu, Maoguo Gong, Wei Zhao, Kaiyuan Feng, Hao Li

Therefore, we propose a sensitiveness based method to evaluate the importance of each layer from the perspective of inference accuracy by adding extra damage for the original model.

Image Classification

Semantic Data Augmentation based Distance Metric Learning for Domain Generalization

Mengzhu Wang, Jianlong Yuan, Qi Qian, Zhibin Wang, Hao Li

Further, we provide an in-depth analysis of the mechanism and rational behind our approach, which gives us a better understanding of why leverage logits in lieu of features can help domain generalization.

Data Augmentation Domain Generalization

DnSwin: Toward Real-World Denoising via Continuous Wavelet Sliding-Transformer

Hao Li, Zhijing Yang, Xiaobin Hong, Ziying

Real-world image denoising is a practical image restoration problem that aims to obtain clean images from in-the-wild noisy inputs.

Image Denoising Image Restoration

Criteria Comparative Learning for Real-scene Image Super-Resolution

2 code implementations26 Jul 2022 Yukai Shi, Hao Li, Sen Zhang, Zhijing Yang, Xiao Wang

Inspired by the observation that the contrastive relationship could also exist between the criteria, in this work, we propose a novel training paradigm for RealSR, named Criteria Comparative Learning (Cria-CL), by developing contrastive losses defined on criteria instead of image patches.

Contrastive Learning Image Super-Resolution +1

Large-Kernel Attention for 3D Medical Image Segmentation

no code implementations19 Jul 2022 Hao Li, Yang Nan, Javier Del Ser, Guang Yang

The performance improvement due to the proposed LK attention module was also statistically validated.

Computed Tomography (CT) Image Segmentation +4

Cross Vision-RF Gait Re-identification with Low-cost RGB-D Cameras and mmWave Radars

no code implementations16 Jul 2022 Dongjiang Cao, Ruofeng Liu, Hao Li, Shuai Wang, Wenchao Jiang, Chris Xiaoxuan Lu

Human identification is a key requirement for many applications in everyday life, such as personalized services, automatic surveillance, continuous authentication, and contact tracing during pandemics, etc.

Metric Learning Person Re-Identification

Dynamic Gradient Reactivation for Backward Compatible Person Re-identification

no code implementations12 Jul 2022 Xiao Pan, Hao Luo, Weihua Chen, Fan Wang, Hao Li, Wei Jiang, Jianming Zhang, Jianyang Gu, Peike Li

To address this issue, we propose the Ranking-based Backward Compatible Learning (RBCL), which directly optimizes the ranking metric between new features and old features.

Person Re-Identification Retrieval

Human Treelike Tubular Structure Segmentation: A Comprehensive Review and Future Perspectives

no code implementations12 Jul 2022 Hao Li, Zeyu Tang, Yang Nan, Guang Yang

Various structures in human physiology follow a treelike morphology, which often expresses complexity at very fine scales.

Computed Tomography (CT)

DLME: Deep Local-flatness Manifold Embedding

2 code implementations7 Jul 2022 Zelin Zang, Siyuan Li, Di wu, Ge Wang, Lei Shang, Baigui Sun, Hao Li, Stan Z. Li

To overcome the underconstrained embedding problem, we design a loss and theoretically demonstrate that it leads to a more suitable embedding based on the local flatness.

Contrastive Learning Data Augmentation +1

Location reference recognition from texts: A survey and comparison

no code implementations4 Jul 2022 Xuke Hu, Zhiyong Zhou, Hao Li, Yingjie Hu, Fuqiang Gu, Jens Kersten, Hongchao Fan, Friederike Klan

Further, there lacks a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and a core step of geoparsing.

Information Retrieval Management +1

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

1 code implementation23 Jun 2022 Tairan Huang, Xu Li, Hao Li, Mingming Sun, Ping Li

As discussed in this paper, under the settings of the off-policy actor critic algorithms, we demonstrate that the critic can bring more expected discounted rewards than or at least equal to the actor.

Reinforcement Learning (RL)

Real-World Image Super-Resolution by Exclusionary Dual-Learning

1 code implementation6 Jun 2022 Hao Li, Jinghui Qin, Zhijing Yang, Pengxu Wei, Jinshan Pan, Liang Lin, Yukai Shi

Real-world image super-resolution is a practical image restoration problem that aims to obtain high-quality images from in-the-wild input, has recently received considerable attention with regard to its tremendous application potentials.

Image Restoration Image Super-Resolution

Point RCNN: An Angle-Free Framework for Rotated Object Detection

no code implementations28 May 2022 Qiang Zhou, Chaohui Yu, Zhibin Wang, Hao Li

To tackle this problem, we propose a purely angle-free framework for rotated object detection, called Point RCNN, which mainly consists of PointRPN and PointReg.

object-detection Object Detection In Aerial Images

SwinVRNN: A Data-Driven Ensemble Forecasting Model via Learned Distribution Perturbation

no code implementations26 May 2022 Yuan Hu, Lei Chen, Zhibin Wang, Hao Li

We also compare four categories of perturbation methods for ensemble forecasting, i. e. fixed distribution perturbation, learned distribution perturbation, MC dropout, and multi model ensemble.

Weather Forecasting

An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation

no code implementations25 May 2022 Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Xiangyang Ji, Antoni B. Chan

With our empirical result obtained from 1, 330 models, we provide the following main observations: 1) ERM combined with data augmentation can achieve state-of-the-art performance if we choose a proper pre-trained model respecting the data property; 2) specialized algorithms further improve the robustness on top of ERM when handling a specific type of distribution shift, e. g., GroupDRO for spurious correlation and CORAL for large-scale out-of-distribution data; 3) Comparing different pre-training modes, architectures and data sizes, we provide novel observations about pre-training on distribution shift, which sheds light on designing or selecting pre-training strategy for different kinds of distribution shifts.

Data Augmentation

Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation

no code implementations13 May 2022 Jianan Liu, Hao Li, Tao Huang, Euijoon Ahn, Kang Han, Adeel Razi, Wei Xiang, Jinman Kim, David Dagan Feng

However, the difference in degradation representations between synthetic and authentic LR images suppresses the quality of SR images reconstructed from authentic LR images.

Image Registration Representation Learning +1

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations11 May 2022 Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

Joint learning of object graph and relation graph for visual question answering

no code implementations9 May 2022 Hao Li, Xu Li, Belhal Karimi, Jie Chen, Mingming Sun

Modeling visual question answering(VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability.

Question Answering Visual Question Answering

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion

no code implementations CVPR 2022 Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, Shiry Ginosar

We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion.

Task Adaptive Parameter Sharing for Multi-Task Learning

1 code implementation CVPR 2022 Matthew Wallingford, Hao Li, Alessandro Achille, Avinash Ravichandran, Charless Fowlkes, Rahul Bhotika, Stefano Soatto

TAPS solves a joint optimization problem which determines which layers to share with the base model and the value of the task-specific weights.

Multi-Task Learning

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

1 code implementation CVPR 2022 Hansheng Chen, Pichao Wang, Fan Wang, Wei Tian, Lu Xiong, Hao Li

The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution.

3D Object Detection 6D Pose Estimation using RGB +1

PMAL: Open Set Recognition via Robust Prototype Mining

no code implementations16 Mar 2022 Jing Lu, Yunxu Xu, Hao Li, Zhanzhan Cheng, Yi Niu

Accordingly, the embedding space can be better optimized to discriminate therein the predefined classes and between known and unknowns.

Open Set Learning

ModDrop++: A Dynamic Filter Network with Intra-subject Co-training for Multiple Sclerosis Lesion Segmentation with Missing Modalities

1 code implementation7 Mar 2022 Han Liu, Yubo Fan, Hao Li, Jiacheng Wang, Dewei Hu, Can Cui, Ho Hin Lee, Huahong Zhang, Ipek Oguz

Previously, a training strategy termed Modality Dropout (ModDrop) has been applied to MS lesion segmentation to achieve the state-of-the-art performance with missing modality.

Lesion Segmentation

On Representation Learning with Feedback

1 code implementation15 Feb 2022 Hao Li

This note complements the author's recent paper "Robust representation learning with feedback for single image deraining" by providing heuristically theoretical explanations on the mechanism of representation learning with feedback, namely an essential merit of the works presented in this recent article.

Representation Learning Single Image Deraining

GiraffeDet: A Heavy-Neck Paradigm for Object Detection

2 code implementations ICLR 2022 Yiqi Jiang, Zhiyu Tan, Junyan Wang, Xiuyu Sun, Ming Lin, Hao Li

This heavy-backbone design paradigm is mostly due to the historical legacy when transferring image recognition models to object detection rather than an end-to-end optimized design for object detection.

object-detection Object Detection

Image-to-Video Re-Identification via Mutual Discriminative Knowledge Transfer

no code implementations21 Jan 2022 Pichao Wang, Fan Wang, Hao Li

During the KD process, the TCL loss transfers the local structure, exploits the higher order information, and mitigates the misalignment of the heterogeneous output of teacher and student networks.

Knowledge Distillation Transfer Learning

Studying Popular Open Source Machine Learning Libraries and Their Cross-Ecosystem Bindings

1 code implementation18 Jan 2022 Hao Li, Cor-Paul Bezemer

Our study shows that the vast majority of the studied bindings cover only a small portion of the source library releases, and the delay for receiving support for a source library release is large.

BIG-bench Machine Learning

Graph Neural Networks for Double-Strand DNA Breaks Prediction

no code implementations4 Jan 2022 Xu Wang, Huan Zhao, WeiWei Tu, Hao Li, Yu Sun, Xiaochen Bo

Double-strand DNA breaks (DSBs) are a form of DNA damage that can cause abnormal chromosomal rearrangements.

ELSA: Enhanced Local Self-Attention for Vision Transformer

1 code implementation23 Dec 2021 Jingkai Zhou, Pichao Wang, Fan Wang, Qiong Liu, Hao Li, Rong Jin

Self-attention is powerful in modeling long-range dependencies, but it is weak in local finer-level feature learning.

Image Classification Instance Segmentation +2

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion

1 code implementation21 Dec 2021 Shruti Agarwal, Liwen Hu, Evonne Ng, Trevor Darrell, Hao Li, Anna Rohrbach

In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques.


On the Dilution of Precision for Time Difference of Arrival with Station Deployment

no code implementations10 Dec 2021 Fengyun Zhang, Hao Li, Yulong Ding, Shuang-Hua Yang, Li Yang

The paper aims to reveal the relationship between the performance of moving object tracking algorithms and the tracking anchors (station) deployment.

Object Tracking TAG

Design and Implementation of Real-Time Localization System (RTLS) based on UWB and TDoA Algorithm

no code implementations9 Dec 2021 Fengyun Zhang, Li Yang, Yuhuan Liu, Yulong Ding, Shuang-Hua Yang, Hao Li

The challenges of indoor localization include inadequate localization accuracy, unreasonable anchor deployment in complex scenarios, lack of stability, and high cost.

Indoor Localization

TransZero: Attribute-guided Transformer for Zero-Shot Learning

1 code implementation3 Dec 2021 Shiming Chen, Ziming Hong, Yang Liu, Guo-Sen Xie, Baigui Sun, Hao Li, Qinmu Peng, Ke Lu, Xinge You

Although some attention-based models have attempted to learn such region features in a single image, the transferability and discriminative attribute localization of visual features are typically neglected.

Zero-Shot Learning

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

1 code implementation CVPR 2022 Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai

The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage.

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

1 code implementation2 Dec 2021 Zhaoyuan Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li, Rong Jin

Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.

Ranked #2 on Unsupervised Semantic Segmentation on COCO-Stuff-171 (using extra training data)

Segmentation Self-Supervised Learning +1

3D High-Quality Magnetic Resonance Image Restoration in Clinics Using Deep Learning

no code implementations28 Nov 2021 Hao Li, Jianan Liu

We also analyzed several down-sampling strategies based on the acceleration factor, including multiple combinations of in-plane and through-plane down-sampling, and developed a controllable and quantifiable motion artifact generation method.

Image Restoration Super-Resolution

MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection

1 code implementation26 Nov 2021 Zhenhong Sun, Ming Lin, Xiuyu Sun, Zhiyu Tan, Hao Li, Rong Jin

Recent researches attempt to reduce this cost by optimizing the backbone architecture with the help of Neural Architecture Search (NAS).

Neural Architecture Search object-detection +1

Improved Fine-Tuning by Better Leveraging Pre-Training Data

no code implementations24 Nov 2021 Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Xiangyang Ji, Antoni Chan, Rong Jin

The generalization result of using pre-training data shows that the excess risk bound on a target task can be improved when the appropriate pre-training data is included in fine-tuning.

Image Classification Learning Theory

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

2 code implementations23 Nov 2021 Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, Rong Jin

We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks.

 Ranked #1 on Unsupervised Person Re-Identification on Market-1501 (Rank-1 metric, using extra training data)

Self-Supervised Learning Unsupervised Domain Adaptation +1

Topologically Consistent Multi-View Face Inference Using Volumetric Sampling

no code implementations ICCV 2021 Tianye Li, Shichen Liu, Timo Bolkart, Jiayi Liu, Hao Li, Yajie Zhao

We propose ToFu, Topologically consistent Face from multi-view, a geometry inference framework that can produce topologically consistent meshes across facial identities and expressions using a volumetric representation instead of an explicit underlying 3DMM.

3D Reconstruction

HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning

2 code implementations NeurIPS 2021 Shiming Chen, Guo-Sen Xie, Yang Liu, Qinmu Peng, Baigui Sun, Hao Li, Xinge You, Ling Shao

Specifically, HSVA aligns the semantic and visual domains by adopting a hierarchical two-step adaptation, i. e., structure adaptation and distribution adaptation.

Transfer Learning Zero-Shot Learning

NAS-Bench-Zero: A Large Scale Dataset for Understanding Zero-Shot Neural Architecture Search

no code implementations29 Sep 2021 Hanlin Chen, Ming Lin, Xiuyu Sun, Hao Li

Based on these new discoveries, we propose i) a novel hybrid zero-shot proxy which outperforms existing ones by a large margin and is transferable among popular search spaces; ii) a new index for better measuring the true performance of ZS-NAS proxies in constrained NAS.

Benchmarking Neural Architecture Search

Unsupervised Domain Adaptation By Optimal Transportation Of Clusters Between Domains

no code implementations29 Sep 2021 Yang Liu, Zhipeng Zhou, Lei Shang, Baigui Sun, Hao Li, Rong Jin

Unsupervised domain adaptation (UDA) aims to transfer the knowledge from a labeled source domain to an unlabeled target domain.

Clustering Transfer Learning +1

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

1 code implementation27 Sep 2021 Shizhou Zhang, De Cheng, Wenlong Luo, Yinghui Xing, Duo Long, Hao Li, Kai Niu, Guoqiang Liang, Yanning Zhang

Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person retrieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images.

Person Search Retrieval +3

Unsupervised Cross-Modality Domain Adaptation for Segmenting Vestibular Schwannoma and Cochlea with Data Augmentation and Model Ensemble

no code implementations24 Sep 2021 Hao Li, Dewei Hu, Qibang Zhu, Kathleen E. Larson, Huahong Zhang, Ipek Oguz

To overcome this problem, domain adaptation is an effective way to leverage information from source domain to obtain accurate segmentations without requiring manual labels in target domain.

Data Augmentation Domain Adaptation +2

Interpolation variable rate image compression

1 code implementation20 Sep 2021 Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Yichen Qian, Dongyang Li, Hao Li

Compression standards have been used to reduce the cost of image storage and transmission for decades.

Image Compression MS-SSIM +1

DisUnknown: Distilling Unknown Factors for Disentanglement Learning

1 code implementation ICCV 2021 Sitao Xiang, Yuming Gu, Pengda Xiang, Menglei Chai, Hao Li, Yajie Zhao, Mingming He

In this paper, we adopt a general setting where all factors that are hard to label or identify are encapsulated as a single unknown factor.


CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

2 code implementations ICLR 2022 Tongkun Xu, Weihua Chen, Pichao Wang, Fan Wang, Hao Li, Rong Jin

Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively.

Unsupervised Domain Adaptation

Scaled ReLU Matters for Training Vision Transformers

no code implementations8 Sep 2021 Pichao Wang, Xue Wang, Hao Luo, Jingkai Zhou, Zhipeng Zhou, Fan Wang, Hao Li, Rong Jin

In this paper, we further investigate this problem and extend the above conclusion: only early convolutions do not help for stable training, but the scaled ReLU operation in the \textit{convolutional stem} (\textit{conv-stem}) matters.

Dash: Semi-Supervised Learning with Dynamic Thresholding

no code implementations1 Sep 2021 Yi Xu, Lei Shang, Jinxing Ye, Qi Qian, Yu-Feng Li, Baigui Sun, Hao Li, Rong Jin

In this work we develop a simple yet powerful framework, whose key idea is to select a subset of training examples from the unlabeled data when performing existing SSL methods so that only the unlabeled examples with pseudo labels related to the labeled data will be used to train models.

Semi-Supervised Image Classification

Digging into Uncertainty in Self-supervised Multi-view Stereo

1 code implementation ICCV 2021 Hongbin Xu, Zhipeng Zhou, Yali Wang, Wenxiong Kang, Baigui Sun, Hao Li, Yu Qiao

Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background.

Image Reconstruction Self-Supervised Learning

Exploring the Quality of GAN Generated Images for Person Re-Identification

no code implementations23 Aug 2021 Yiqi Jiang, Weihua Chen, Xiuyu Sun, Xiaoyu Shi, Fan Wang, Hao Li

Recently, GAN based method has demonstrated strong effectiveness in generating augmentation data for person re-identification (ReID), on account of its ability to bridge the gap between domains and enrich the data variety in feature space.

Person Re-Identification Unsupervised Domain Adaptation

Fine-Grained AutoAugmentation for Multi-Label Classification

no code implementations12 Jul 2021 Ya Wang, Hesen Chen, Fangyi Zhang, Yaohua Wang, Xiuyu Sun, Ming Lin, Hao Li

Data augmentation is a commonly used approach to improving the generalization of deep learning models.

Classification Data Augmentation +3

A Cloud-Edge-Terminal Collaborative System for Temperature Measurement in COVID-19 Prevention

no code implementations11 Jul 2021 Zheyi Ma, Hao Li, Wen Fang, Qingwen Liu, Bin Zhou, Zhiyong Bu

Then, a mobile detection model based on a multi-task cascaded convolutional network (MTCNN) is proposed to realize face alignment and mask detection on the RGB images.

Face Alignment

LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation

no code implementations9 Jul 2021 Dewei Hu, Can Cui, Hao Li, Kathleen E. Larson, Yuankai K. Tao, Ipek Oguz

We then construct the local intensity fusion encoder (LIFE) to map a given OCT-A volume and its LIF counterpart to a shared latent space.

Retinal Vessel Segmentation Segmentation

Graph Convolution for Re-ranking in Person Re-identification

1 code implementation5 Jul 2021 Yuqi Zhang, Qian Qi, Chong Liu, Weihua Chen, Fan Wang, Hao Li, Rong Jin

In this work, we propose a graph-based re-ranking method to improve learned features while still keeping Euclidean distance as the similarity metric.

Person Re-Identification Re-Ranking +1

Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement

no code implementations CVPR 2021 Huiwen Luo, Koki Nagano, Han-Wei Kung, Mclean Goldwhite, Qingguo Xu, Zejian Wang, Lingyu Wei, Liwen Hu, Hao Li

Cutting-edge 3D face reconstruction methods use non-linear morphable face models combined with GAN-based decoders to capture the likeness and details of a person but fail to produce neutral head models with unshaded albedo textures which is critical for creating relightable and animation-friendly avatars for integration in virtual environments.

3D Face Reconstruction Face Model

SKFAC: Training Neural Networks With Faster Kronecker-Factored Approximate Curvature

1 code implementation CVPR 2021 Zedong Tang, Fenlong Jiang, Maoguo Gong, Hao Li, Yue Wu, Fan Yu, Zidong Wang, Min Wang

For the fully connected layers, by utilizing the low-rank property of Kronecker factors of Fisher information matrix, our method only requires inverting a small matrix to approximate the curvature with desirable accuracy.

Dimensionality Reduction

Task-Generic Hierarchical Human Motion Prior using VAEs

no code implementations7 Jun 2021 Jiaman Li, Ruben Villegas, Duygu Ceylan, Jimei Yang, Zhengfei Kuang, Hao Li, Yajie Zhao

We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation, motion completion from partial observations, and motion synthesis from sparse key-frames.

Motion Synthesis Pose Estimation