Search Results for author: Hao Jiang

Found 144 papers, 40 papers with code

Visual Prompt Tuning for Few-Shot Text Classification

no code implementations COLING 2022 Jingyuan Wen, Yutian Luo, Nanyi Fei, Guoxing Yang, Zhiwu Lu, Hao Jiang, Jie Jiang, Zhao Cao

In few-shot text classification, a feasible paradigm for deploying VL-PTMs is to align the input samples and their category names via the text encoders.

Few-Shot Learning Few-Shot Text Classification +4

Near-Field Sensing Enabled Predictive Beamforming: Fundamentals, Framework, and Opportunities

no code implementations10 Jun 2025 Hao Jiang, Zhaolin Wang, Yue Liu, Hyundong Shin, Arumugam Nallanathan, Yuanwei Liu

The article proposes a novel near-field predictive beamforming framework for high-mobility wireless networks.

Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving

no code implementations5 Jun 2025 Hao Jiang, Chuan Hu, Yukang Shi, Yuan He, Ke Wang, Xi Zhang, Zhipeng Zhang

In contrast to existing VLMs with over 7B parameters and unstructured language processing(e. g., LLaVA-1. 5), FastDrive understands structured and concise descriptions and generates machine-friendly driving decisions with high efficiency.

Autonomous Driving Decision Making

SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers

3 code implementations1 Jun 2025 Zhengcong Fei, Hao Jiang, Di Qiu, Baoxuan Gu, Youqiang Zhang, Jiahua Wang, Jialin Bai, Debang Li, Mingyuan Fan, Guibin Chen, Yahui Zhou

The generation and editing of audio-conditioned talking portraits guided by multimodal inputs, including text, images, and videos, remains under explored.

Denoising

STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering

no code implementations28 May 2025 Zehao Li, Hao Jiang, Yujun Cai, Jianing Chen, Baolong Bi, Shuqin Gao, Honglong Zhao, Yiwei Wang, Tianlu Mao, Zhaoqi Wang

Although dynamic scene reconstruction has long been a fundamental challenge in 3D vision, the recent emergence of 3D Gaussian Splatting (3DGS) offers a promising direction by enabling high-quality, real-time rendering through explicit Gaussian primitives.

3DGS Dynamic Reconstruction

CiUAV: A Multi-Task 3D Indoor Localization System for UAVs based on Channel State Information

no code implementations27 May 2025 Cunyi Yin, Chenwei Wang, Jing Chen, Hao Jiang, Xiren Miao, Shaocong Zheng Zhenghua Chen Senior, Hong Yan

The proposed system provides a cost-effective and scalable solution, demonstrating its usefulness for UAV applications in resource-constrained indoor environments.

Indoor Localization

DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving

no code implementations26 May 2025 Anqing Jiang, Yu Gao, Zhigang Sun, Yiru Wang, Jijun Wang, Jinghao Chai, Qian Cao, Yuweng Heng, Hao Jiang, Yunda Dong, Zongzheng Zhang, Xianda Guo, Hao Sun, Hao Zhao

Research interest in end-to-end autonomous driving has surged owing to its fully differentiable design integrating modular tasks, i. e. perception, prediction and planing, which enables optimization in pursuit of the ultimate goal.

Autonomous Driving Diversity +2

ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images

no code implementations10 May 2025 Xianghao Kong, Qiaosong Qi, Yuanbin Wang, Anyi Rao, Biaolong Chen, Aixi Zhang, Si Liu, Hao Jiang

To address these issues, we propose ProFashion, a fashion video generation framework leveraging multiple reference images to achieve improved view consistency and temporal coherency.

Denoising Video Generation

Fast-Slow Thinking for Large Vision-Language Model Reasoning

no code implementations25 Apr 2025 Wenyi Xiao, Leilei Gan, Weilong Dai, Wanggui He, Ziwei Huang, Haoyuan Li, Fangxun Shu, Zhelun Yu, Peng Zhang, Hao Jiang, Fei Wu

Recent advances in large vision-language models (LVLMs) have revealed an \textit{overthinking} phenomenon, where models generate verbose reasoning across all tasks regardless of questions.

Language Modeling Language Modelling

Pinching-Antenna System (PASS) Enhanced Covert Communications

no code implementations14 Apr 2025 Hao Jiang, Zhaolin Wang, Yuanwei Liu

Capitalizing on this high reconfigurable flexibility of antennas, the potential of PASS for covert communications is investigated.

Position

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO

no code implementations31 Mar 2025 Qihan Huang, Long Chan, Jinlong Liu, Wanggui He, Hao Jiang, Mingli Song, Jingyuan Chen, Chang Yao, Jie Song

Current reasoning methods fall into two types: PRM, which supervises the intermediate reasoning steps, and ORM, which supervises the final results.

Mathematical Reasoning Multimodal Reasoning

SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint

no code implementations17 Mar 2025 Zhenlong Yuan, Zhidong Yang, Yujun Cai, Kuangxin Wu, Mufan Liu, Dapeng Zhang, Hao Jiang, Zhaoxin Li, Zhaoqi Wang

Recently, patch-deformation methods have exhibited significant effectiveness in multi-view stereo owing to the deformable and expandable patches in reconstructing textureless areas.

Panoptic Segmentation Segmentation

Optimizing Ride-Pooling Operations with Extended Pickup and Drop-Off Flexibility

no code implementations11 Mar 2025 Hao Jiang, Yixing Xu, Pradeep Varakantham

The Ride-Pool Matching Problem (RMP) is central to on-demand ride-pooling services, where vehicles must be matched with multiple requests while adhering to service constraints such as pickup delays, detour limits, and vehicle capacity.

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

no code implementations7 Mar 2025 Guanghao Zhang, Tao Zhong, Yan Xia, Zhelun Yu, Haoyuan Li, Wanggui He, Fangxun Shu, Mushui Liu, Dong She, Yi Wang, Hao Jiang

The construction of interleaved multimodal multi-step reasoning chains, which utilize critical visual region tokens, extracted from intermediate reasoning steps, as supervisory signals.

Image Comprehension Memorization

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval

1 code implementation1 Mar 2025 Shangzhe Di, Zhelun Yu, Guanghao Zhang, Haoyuan Li, Tao Zhong, Hao Cheng, Bolin Li, Wanggui He, Fangxun Shu, Hao Jiang

ReKV enables the separation of video encoding and question-answering across different processes and GPUs, significantly enhancing the efficiency of StreamingVQA.

Question Answering Retrieval +1

UniGO: A Unified Graph Neural Network for Modeling Opinion Dynamics on Graphs

no code implementations17 Feb 2025 Hao Li, Hao Jiang, Yuke Zheng, Hao Sun, Wenying Gong

Polarization and fragmentation in social media amplify user biases, making it increasingly important to understand the evolution of opinions.

Graph Neural Network

D$^2$-DPM: Dual Denoising for Quantized Diffusion Probabilistic Models

1 code implementation14 Jan 2025 Qian Zeng, Jie Song, Han Zheng, Hao Jiang, Mingli Song

In this work, we propose D2-DPM, a dual denoising mechanism aimed at precisely mitigating the adverse effects of quantization noise on the noise estimation network.

Denoising Image Generation +2

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework

no code implementations27 Dec 2024 Jiang Liu, Bolin Li, Haoyuan Li, Tianwei Lin, Wenqiao Zhang, Tao Zhong, Zhelun Yu, Jinghao Wei, Hao Cheng, Hao Jiang, Zheqi Lv, Juncheng Li, Siliang Tang, Yueting Zhuang

Efficient multimodal large language models (EMLLMs), in contrast to multimodal large language models (MLLMs), reduce model size and computational costs and are often deployed on resource-constrained devices.

Cramér-Rao Bound Optimization for Near-Field Sensing with Continuous-Aperture Arrays

no code implementations19 Dec 2024 Hao Jiang, Zhaolin Wang, Yuanwei Liu, Arumugam Nallanathan

A Cram\'er-Rao bound (CRB) optimization framework for near-field sensing (NISE) with continuous-aperture arrays (CAPAs) is proposed.

All-in-One: Transferring Vision Foundation Models into Stereo Matching

no code implementations13 Dec 2024 Jingyi Zhou, Haoyu Zhang, Jiakang Yuan, Peng Ye, Tao Chen, Hao Jiang, Meiya Chen, Yangyang Zhang

Inspired by the ability of vision foundation models (VFMs) to extract general representations, in this work, we propose AIO-Stereo which can flexibly select and transfer knowledge from multiple heterogeneous VFMs to a single stereo matching model.

All Stereo Matching +1

COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

no code implementations13 Dec 2024 Yuchen Ren, Wenwei Han, Qianyuan Zhang, Yining Tang, Weiqiang Bai, Yuchen Cai, Lifeng Qiao, Hao Jiang, Dong Yuan, Tao Chen, Siqi Sun, Pan Tan, Wanli Ouyang, Nanqing Dong, Xinzhu Ma, Peng Ye

To address this, we introduce the first comprehensive multi-omics benchmark COMET (Benchmark for Biological COmprehensive Multi-omics Evaluation Tasks and Language Models), designed to evaluate models across single-omics, cross-omics, and multi-omics tasks.

Unsupervised Cross-Domain Regression for Fine-grained 3D Game Character Reconstruction

no code implementations11 Dec 2024 Qi Wen, Xiang Wen, Hao Jiang, Siqi Yang, Bingfeng Han, Tianlei Hu, Gang Chen, Shuang Li

Meanwhile, the game is the carrier of the metaverse, in which players can freely edit the facial appearance of the game character.

regression Transfer Learning

Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models

1 code implementation10 Dec 2024 Hao Li, Ruoyuan Gong, Hao Jiang

Predicting roll call votes through modeling political actors has emerged as a focus in quantitative political science and computer science.

From Principles to Practice: A Deep Dive into AI Ethics and Regulations

no code implementations6 Dec 2024 Nan Sun, Yuantian Miao, Hao Jiang, Ming Ding, Jun Zhang

In the rapidly evolving domain of Artificial Intelligence (AI), the complex interaction between innovation and regulation has become an emerging focus of our society.

Ethics

Environment Reconstruction with Multi-targets Reflectors-merged Sensing Method Based on THz Single-sided Channel Characteristics

no code implementations5 Dec 2024 Zhaowei Chang, Pan Tang, Jianhua Zhang, Hao Jiang, Guangyi Liu

Reconstructing the propagation environment is a vital step for THz ISAC, as it enhances the predictability of the communication channel to reduce communication overhead.

Integrated sensing and communication ISAC

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

1 code implementation CVPR 2025 Qihan Huang, Long Chan, Jinlong Liu, Wanggui He, Hao Jiang, Mingli Song, Jie Song

To tackle this problem, this work proposes PatchDPO that estimates the quality of image patches within each generated image and accordingly trains the model.

Image Generation Image Reconstruction +1

Multi-scale Vehicle Localization In Heterogeneous Mobile Communication Networks

no code implementations1 Dec 2024 Lele Cong, Kaitao Meng, Deshi Li, Hao Jiang, Liang Xu

Additionally, through SF matching by integrating the inclusion and adjacency position relationships, a multi-scale vehicle localization (MSVL) algorithm is proposed to identify vehicular road signal patterns and determine the real-time segment and coordinates.

Position

AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool

no code implementations6 Nov 2024 Zhongliang Tang, Mengchen Tan, Fei Xia, Qingrong Cheng, Hao Jiang, Yongxiang Zhang

Our experimental results demonstrate the effectiveness of our system in maintaining coherence between the constructed interfaces and the original designs.

A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications

no code implementations21 Oct 2024 Wenyi Xiao, Zechuan Wang, Leilei Gan, Shuai Zhao, Wanggui He, Luu Anh Tuan, Long Chen, Hao Jiang, Zhou Zhao, Fei Wu

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical.

LG-CAV: Train Any Concept Activation Vector with Language Guidance

no code implementations14 Oct 2024 Qihan Huang, Jie Song, Mengqi Xue, Haofei Zhang, Bingde Hu, Huiqiong Wang, Hao Jiang, Xingen Wang, Mingli Song

To bridge the gap between vision-language model and the target model, we calculate the activation values of concept descriptions on a common pool of images (probe images) with vision-language model and utilize them as language guidance to train the LG-CAV.

Language Modeling Language Modelling

High-Efficient Near-Field Channel Characteristics Analysis for Large-Scale MIMO Communication Systems

no code implementations11 Oct 2024 Hao Jiang, Wangqi Shi, Xiao Chen, Qiuming Zhu, Zhen Chen

A comprehensive analysis is conducted to investigate the influence of the height of the BS, motion characteristics of the MR, and antenna configurations on the channel statistics.

CursorCore: Assist Programming through Aligning Anything

1 code implementation9 Oct 2024 Hao Jiang, Qi Liu, Rui Li, Shengyu Ye, Shijin Wang

In this work, we propose a new conversational framework that comprehensively integrates these information sources, collect data to train our models and evaluate their performance.

Code Completion

Pyramidal Flow Matching for Efficient Video Generative Modeling

1 code implementation8 Oct 2024 Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang song, Yadong Mu, Zhouchen Lin

Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage.

Text-to-Video Generation Video Generation

SAG: Style-Aligned Article Generation via Model Collaboration

no code implementations4 Oct 2024 Chenning Xu, Fangxun Shu, Dian Jin, Jinghao Wei, Hao Jiang

In this paper, we present a novel collaborative training framework that leverages the strengths of both LLMs and SLMs for style article generation, surpassing the performance of either model alone.

Hallucination Instruction Following +1

Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

1 code implementation27 Sep 2024 Hongzhe Huang, Jiang Liu, Zhewen Yu, Li Cai, Dian Jiao, Wenqiao Zhang, Siliang Tang, Juncheng Li, Hao Jiang, Haoyuan Li, Yueting Zhuang

Recent advances in Multi-modal Large Language Models (MLLMs), such as LLaVA-series models, are driven by massive machine-generated instruction-following data tuning.

Instruction Following Language Modeling +2

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

1 code implementation26 Sep 2024 Qihan Huang, Siming Fu, Jinlong Liu, Hao Jiang, Yipeng Yu, Jie Song

Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest.

Object Personalized Image Generation +2

GAInS: Gradient Anomaly-aware Biomedical Instance Segmentation

1 code implementation21 Sep 2024 Runsheng Liu, Hao Jiang, Yanning Zhou, Huangjing Lin, Liansheng Wang, Hao Chen

Instance segmentation plays a vital role in the morphological quantification of biomedical entities such as tissues and cells, enabling precise identification and delineation of different structures.

Instance Segmentation Segmentation +1

Holistic and Historical Instance Comparison for Cervical Cell Detection

no code implementations21 Sep 2024 Hao Jiang, Runsheng Liu, Yanning Zhou, Huangjing Lin, Hao Chen

To this end, we propose a holistic and historical instance comparison approach for cervical cell detection.

Cell Detection whole slide images

Federated Learning with Integrated Sensing, Communication, and Computation: Frameworks and Performance Analysis

no code implementations17 Sep 2024 Yipeng Liang, Qimei Chen, Hao Jiang

With the emergence of integrated sensing, communication, and computation (ISCC) in the upcoming 6G era, federated learning with ISCC (FL-ISCC), integrating sample collection, local training, and parameter exchange and aggregation, has garnered increasing interest for enhancing training efficiency.

Federated Learning

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

1 code implementation28 Aug 2024 Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang

We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM).

Computational Efficiency Hallucination +3

RePair: Automated Program Repair with Process-based Feedback

1 code implementation21 Aug 2024 Yuze Zhao, Zhenya Huang, Yixiao Ma, Rui Li, Kai Zhang, Hao Jiang, Qi Liu, Linbo Zhu, Yu Su

The gap between the trepidation of program reliability and the expense of repairs underscores the indispensability of Automated Program Repair (APR).

Program Repair

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition

1 code implementation19 Aug 2024 Tianwei Lin, Jiang Liu, Wenqiao Zhang, Zhaocheng Li, Yang Dai, Haoyuan Li, Zhelun Yu, Wanggui He, Juncheng Li, Hao Jiang, Siliang Tang, Yueting Zhuang

Considering this, we introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts, and thus achieving the right balance of effectiveness and efficiency: (i) For collaboration, a novel knowledge-sharing and -organizing mechanism is devised to appropriately reduce the scale of matrix operations, thereby boosting the training and inference speed.

Multi-Task Learning parameter-efficient fine-tuning +1

Near-Field Sensing: A Low-Complexity Wavenumber-Domain Method

no code implementations18 Aug 2024 Hao Jiang, Zhaolin Wang, Yuanwei Liu

A novel low-complexity wavenumber-domain method is proposed for near-field sensing (NISE).

Dynamic Neural Dowker Network: Approximating Persistent Homology in Dynamic Directed Graphs

1 code implementation17 Aug 2024 Hao Li, Hao Jiang, Jiajun Fan, Dongsheng Ye, Liang Du

This paper introduces the Dynamic Neural Dowker Network (DNDN), a novel framework specifically designed to approximate the results of dynamic Dowker filtration, aiming to capture the high-order topological features of dynamic directed graphs.

Graph Classification Graph Neural Network +1

Improving Network Interpretability via Explanation Consistency Evaluation

no code implementations8 Aug 2024 Hefeng Wu, Hao Jiang, Keze Wang, Ziyi Tang, Xianghuan He, Liang Lin

The pursuit of greater interpretability in neural networks often results in a degradation of their original performance.

Adversarial Attack

Near-Field Sensing Enabled Predictive Beamforming: From Estimation to Tracking

no code implementations4 Aug 2024 Hao Jiang, Zhaolin Wang, Yuanwei Liu

It is also revealed that: 1)the proposed AGD-AO can achieve stable descending with small gradients, thereby accelerating convergence; 2) compared to far-field predictive beamforming and feedback-based schemes, both of the proposed methods exhibit superior performance; and 3) by incorporating multiple CPIs, the EKF method exhibits greater robustness in low SNR regions.

Beam Prediction

On the Evaluation Consistency of Attribution-based Explanations

no code implementations28 Jul 2024 Jiarui Duan, Haoling Li, Haofei Zhang, Hao Jiang, Mengqi Xue, Li Sun, Mingli Song, Jie Song

Our findings underscore the necessity for future research in this domain to conduct rigorous evaluations encompassing a broader range of models and datasets, and to reassess the assumptions underlying the empirical success of different attribution methods.

Benchmarking

Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

no code implementations12 Jul 2024 Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing.

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

1 code implementation10 Jul 2024 Wanggui He, Siming Fu, Mushui Liu, Xierui Wang, Wenyi Xiao, Fangxun Shu, Yi Wang, Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, Leilei Gan, Hao Jiang

Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis.

Image Generation Text Generation

Think-then-Act: A Dual-Angle Evaluated Retrieval-Augmented Generation

no code implementations18 Jun 2024 Yige Shen, Hao Jiang, Hua Qu, Jihong Zhao

Despite their impressive capabilities, large language models (LLMs) often face challenges such as temporal misalignment and generating hallucinatory content.

Retrieval Retrieval-augmented Generation

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

1 code implementation11 Jun 2024 X. Wang, Siming Fu, Qihan Huang, Wanggui He, Hao Jiang

Recent advancements in text-to-image generation models have dramatically enhanced the generation of photorealistic images from textual prompts, leading to an increased interest in personalized text-to-image applications, particularly in multi-subject scenarios.

Text to Image Generation Text-to-Image Generation

Data Valuation by Leveraging Global and Local Statistical Information

no code implementations23 May 2024 Xiaoling Zhou, Ou wu, Michael K. Ng, Hao Jiang

In this paper, we demonstrate that both global and local statistical information of value distributions hold significant potential for data valuation within the context of machine learning.

Data Valuation

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

1 code implementation23 May 2024 Zhicheng Sun, Zhenhao Yang, Yang Jin, Haozhe Chi, Kun Xu, Liwei Chen, Hao Jiang, Yang song, Kun Gai, Yadong Mu

Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance in requiring a special classifier can be resolved with a simple fixed-point solution, allowing flexible personalization with off-the-shelf image discriminators.

Image Generation Personalized Image Generation

Time Matters: Enhancing Pre-trained News Recommendation Models with Robust User Dwell Time Injection

no code implementations21 May 2024 Hao Jiang, Chuanzhen Li, Mingxiao An

Despite this, accurately modeling user preferences remains challenging due to the inherent uncertainty of click behaviors.

News Recommendation Reading Comprehension

An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding

2 code implementations15 May 2024 Renqi Chen, Wenwei Han, Haohao Zhang, Haoyang Su, Zhefan Wang, Xiaolei Liu, Hao Jiang, Wanli Ouyang, Nanqing Dong

Genomic selection (GS), as a critical crop breeding strategy, plays a key role in enhancing food production and addressing the global hunger crisis.

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

1 code implementation22 Apr 2024 Wenyi Xiao, Ziwei Huang, Leilei Gan, Wanggui He, Haoyuan Li, Zhelun Yu, Fangxun Shu, Hao Jiang, Linchao Zhu

The rapidly developing Large Vision Language Models (LVLMs) have shown notable capabilities on a range of multi-modal tasks, but still face the hallucination phenomena where the generated texts do not align with the given contexts, significantly restricting the usages of LVLMs.

Attribute Hallucination +1

An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models

no code implementations20 Mar 2024 Qi Liu, Gang Guo, Jiaxin Mao, Zhicheng Dou, Ji-Rong Wen, Hao Jiang, Xinyu Zhang, Zhao Cao

Based on these findings, we then propose several simple document pruning methods to reduce the storage overhead and compare the effectiveness of different pruning methods on different late-interaction models.

Retrieval

Large-Scale RIS Enabled Air-Ground Channels: Near-Field Modeling and Analysis

no code implementations19 Mar 2024 Hao Jiang, Wangqi Shi, Zaichen Zhang, Cunhua Pan, Qingqing Wu, Feng Shu, Ruiqi Liu, Jiangzhou Wang

Then, we develop a beam domain channel model based on the proposed sub-array partition framework for large-scale RIS-enabled UAV-to-vehicle communication systems, which can be used to efficiently capture the sparse features in RIS-enabled UAV-to-vehicle channels in both near-field and far-field ranges.

Human Activity Recognition with Low-Resolution Infrared Array Sensor Using Semi-supervised Cross-domain Neural Networks for Indoor Environment

no code implementations5 Mar 2024 Cunyi Yin, Xiren Miao, Jing Chen, Hao Jiang, Deying Chen, Yixuan Tong, Shaocong Zheng

The label classifier obtained from training the source domain data improves the recognition of target domain activities due to the semi-supervised learning utilized in training the target domain data.

Domain Adaptation Human Activity Recognition

PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station

1 code implementation4 Mar 2024 Cunyi Yin, Xiren Miao, Jing Chen, Hao Jiang, Jianfei Yang, Yunjiao Zhou, Min Wu, Zhenghua Chen

WiFi-based human pose estimation is a suitable method for monitoring power operations due to its low cost, device-free, and robustness to various illumination conditions. In this paper, a novel Channel State Information (CSI)-based pose estimation framework, namely PowerSkel, is developed to address these challenges.

Knowledge Distillation Pose Estimation

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

1 code implementation5 Feb 2024 Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu

In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.

Science Question Answering Text-to-Video Generation +3

SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization

no code implementations12 Jan 2024 Zhenlong Yuan, Jiakai Cao, Zhaoxin Li, Hao Jiang, Zhaoqi Wang

In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo (SD-MVS), a method that can effectively tackle challenges in 3D reconstruction of textureless areas.

3D Reconstruction

Hybrid Vector Message Passing for Generalized Bilinear Factorization

no code implementations8 Jan 2024 Hao Jiang, Xiaojun Yuan, Qinghua Guo

In this paper, we propose a new message passing algorithm that utilizes hybrid vector message passing (HVMP) to solve the generalized bilinear factorization (GBF) problem.

Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning

no code implementations CVPR 2024 Hao Jiang, Bingfeng Zhou, Yadong Mu

In this paper we propose an innovative halftoning method termed "neural dot-controllable halftoning".

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

no code implementations CVPR 2024 Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao

We propose a unified multi-modal framework -- Audio-Visual Conversational Attention (AV-CONV), for the joint prediction of conversation behaviors -- speaking and listening -- for both the camera wearer as well as all other social partners present in the egocentric video.

CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

1 code implementation14 Dec 2023 Sicheng Wang, Hao Jiang, Lei Xiang

Recent deep multi-view stereo (MVS) methods have widely incorporated transformers into cascade network for high-resolution depth estimation, achieving impressive results.

3D Reconstruction Depth Estimation +1

Random resistive memory-based deep extreme point learning machine for unified visual processing

no code implementations14 Dec 2023 Shaocong Wang, Yizhao Gao, Yi Li, Woyu Zhang, Yifei Yu, Bo wang, Ning Lin, Hegan Chen, Yue Zhang, Yang Jiang, Dingchen Wang, Jia Chen, Peng Dai, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Xiaoxin Xu, Hayden So, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

Our random resistive memory-based deep extreme point learning machine may pave the way for energy-efficient and training-friendly edge AI across various data modalities and tasks.

Audio-Visual LLM for Video Understanding

no code implementations11 Dec 2023 Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie

This paper presents Audio-Visual LLM, a Multimodal Large Language Model that takes both visual and auditory inputs for holistic video understanding.

AudioCaps Language Modeling +4

Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data

no code implementations11 Dec 2023 Lei Zhang, Fangxun Shu, Tianyang Liu, Sucheng Ren, Hao Jiang, Cihang Xie

However, the vast scale of these datasets inevitably introduces significant variability in data quality, which can adversely affect the model performance.

Image Captioning Image-text Retrieval +1

Pruning random resistive memory for optimizing analogue AI

no code implementations13 Nov 2023 Yi Li, Songqi Wang, Yaping Zhao, Shaocong Wang, Woyu Zhang, Yangu He, Ning Lin, Binbin Cui, Xi Chen, Shiming Zhang, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Xiaoxin Xu, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning to optimize the topology of a randomly weighted analogue resistive memory neural network.

Audio Classification Image Segmentation +1

TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System

no code implementations11 Nov 2023 Haoyuan Li, Hao Jiang, Tianke Zhang, Zhelun Yu, Aoxiong Yin, Hao Cheng, Siming Fu, Yuhao Zhang, Wanggui He

We anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.

Decision Making Language Modelling +1

Channel Modeling for Heterogeneous Vehicular ISAC System with Shared Clusters

no code implementations16 Jul 2023 Baiping Xiong, Zaichen Zhang, Yingmeng Ge, Haibo Wang, Hao Jiang, Liang Wu, Ziyang Zhang

In this paper, we consider the channel modeling of a heterogeneous vehicular integrated sensing and communication (ISAC) system, where a dual-functional multi-antenna base station (BS) intends to communicate with a multi-antenna vehicular receiver (MR) and sense the surrounding environments simultaneously.

Integrated sensing and communication ISAC

Training Energy-Based Models with Diffusion Contrastive Divergences

no code implementations4 Jul 2023 Weijian Luo, Hao Jiang, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Zhihua Zhang

In image generation experiments, the proposed DCD is capable of training an energy-based model for generating the Celab-A $32\times 32$ dataset, which is comparable to existing EBMs.

Image Denoising Image Generation

Video Compression with Arbitrary Rescaling Network

no code implementations7 Jun 2023 Mengxi Guo, Shijie Zhao, Hao Jiang, Junlin Li, Li Zhang

Most video platforms provide video streaming services with different qualities, and the quality of the services is usually adjusted by the resolution of the videos.

Video Compression

3GPP-Like GBSM THz Channel Characterization, Modeling, and Simulation Based on Experimental Observations

no code implementations24 May 2023 Zhaowei Chang, Jianhua Zhang, Pan Tang, Lei Tian, Hao Jiang, Ximan Liu, and Guangyi Liu

Finally, we propose the THz channel model and its simulation framework to reconstruct CIRs based on the obtained models, which aim at characterizing the sparser THz channels.

Pyramid Texture Filtering

no code implementations11 May 2023 Qing Zhang, Hao Jiang, Yongwei Nie, Wei-Shi Zheng

We present a simple but effective technique to smooth out textures while preserving the prominent structures.

Image Enhancement Tone Mapping

Egocentric Auditory Attention Localization in Conversations

no code implementations CVPR 2023 Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu

In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others.

DoNet: Deep De-overlapping Network for Cytology Instance Segmentation

1 code implementation CVPR 2023 Hao Jiang, Rushan Zhang, Yanning Zhou, Yumeng Wang, Hao Chen

Cell instance segmentation in cytology images has significant importance for biology analysis and cancer screening, while remains challenging due to 1) the extensive overlapping translucent cell clusters that cause the ambiguous boundaries, and 2) the confusion of mimics and debris as nuclei.

Instance Segmentation Region Proposal +2

Future Aware Pricing and Matching for Sustainable On-demand Ride Pooling

no code implementations21 Feb 2023 Xianjie Zhang, Pradeep Varakantham, Hao Jiang

Traditionally, both these challenges have been studied individually and using myopic approaches (considering only current requests), without considering the impact of current matching on addressing future requests.

Solving Richly Constrained Reinforcement Learning through State Augmentation and Reward Penalties

no code implementations27 Jan 2023 Hao Jiang, Tien Mai, Pradeep Varakantham, Minh Huy Hoang

Constrained Reinforcement Learning has been employed to enforce safety constraints on policy through the use of expected cost constraints.

reinforcement-learning Reinforcement Learning (RL)

Multi-Modal Experience Inspired AI Creation

1 code implementation2 Sep 2022 Qian Cao, Xu Chen, Ruihua Song, Hao Jiang, Guang Yang, Zhao Cao

To model such human capabilities, in this paper, we define and solve a novel AI creation problem based on human experiences.

Multimodal Deep Learning Text Generation

PReGAN: Answer Oriented Passage Ranking with Weakly Supervised GAN

no code implementations5 Jul 2022 Pan Du, Jian-Yun Nie, Yutao Zhu, Hao Jiang, Lixin Zou, Xiaohui Yan

Beyond topical relevance, passage ranking for open-domain factoid question answering also requires a passage to contain an answer (answerability).

Passage Ranking Passage Reranking +2

Correction of out-of-focus microscopic images by deep learning

1 code implementation Computational and Structural Biotechnology Journal 2022 Chi Zhang, Hao Jiang, Weihuang Liu, Junyi Li, Shiming Tang, Mario Juhas, Yang Zhang.

Results To solve the out-of-focus issue in microscopy, we developed a Cycle Generative Adversarial Network (CycleGAN) based model and a multi-component weighted loss function.

Deep Learning Generative Adversarial Network +2

Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

1 code implementation ACL 2022 Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Lan Luo, Ke Zhan, Enrui Hu, Xinyu Zhang, Hao Jiang, Zhao Cao, Fan Yu, Xin Jiang, Qun Liu, Lei Chen

To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR).

Open-Domain Question Answering Passage Retrieval +1

MetaDT: Meta Decision Tree with Class Hierarchy for Interpretable Few-Shot Learning

no code implementations3 Mar 2022 Baoquan Zhang, Hao Jiang, Xutao Li, Shanshan Feng, Yunming Ye, Rui Ye

Then, resorting to the prior, we split each few-shot task to a set of subtasks with different concept levels and then perform class prediction via a model of decision tree.

Few-Shot Learning Representation Learning

KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models

no code implementations28 Feb 2022 Daniel Gao, Yantao Jia, Lei LI, Chengzhen Fu, Zhicheng Dou, Hao Jiang, Xinyu Zhang, Lei Chen, Zhao Cao

However, to figure out whether PLMs can be reliable knowledge sources and used as alternative knowledge bases (KBs), we need to further explore some critical features of PLMs.

General Knowledge Memorization +1

Model Calibration of the Liquid Mercury Spallation Target using Evolutionary Neural Networks and Sparse Polynomial Expansions

no code implementations18 Feb 2022 Majdi I. Radaideh, Hoang Tran, Lianshan Lin, Hao Jiang, Drew Winder, Sarma Gorti, Guannan Zhang, Justin Mach, Sarah Cousineau

Given that some of the calibrated parameters that show a good agreement with the experimental data can be nonphysical mercury properties, we need a more advanced two-phase flow model to capture bubble dynamics and mercury cavitation.

parameter estimation

Deep Learning for Computational Cytology: A Survey

no code implementations10 Feb 2022 Hao Jiang, Yanning Zhou, Yi Lin, Ronald CK Chan, Jiang Liu, Hao Chen

Computational cytology is a critical, rapid-developing, yet challenging topic in the field of medical image computing which analyzes the digitized cytology image by computer-aided technologies for cancer screening.

Deep Learning Medical Image Analysis +2

Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer

no code implementations CVPR 2022 Hao Jiang, Yadong Mu

To address it, this work explores a new solution for video summarization by transferring samples from a correlated task (i. e., video moment localization) equipped with abundant training data.

Ranked #4 on Supervised Video Summarization on SumMe (using extra training data)

Supervised Video Summarization

Semi-asynchronous Hierarchical Federated Learning for Cooperative Intelligent Transportation Systems

no code implementations18 Oct 2021 Qimei Chen, Zehua You, Hao Jiang

Cooperative Intelligent Transport System (C-ITS) is a promising network to provide safety, efficiency, sustainability, and comfortable services for automated vehicles and road infrastructures by taking advantages from participants.

Federated Learning

Towards More Effective and Economic Sparsely-Activated Model

no code implementations14 Oct 2021 Hao Jiang, Ke Zhan, Jianwei Qu, Yongkang Wu, Zhaoye Fei, Xinyu Zhang, Lei Chen, Zhicheng Dou, Xipeng Qiu, Zikai Guo, Ruofei Lai, Jiawen Wu, Enrui Hu, Yinxia Zhang, Yantao Jia, Fan Yu, Zhao Cao

To increase the number of activated experts without an increase in computational cost, we propose SAM (Switch and Mixture) routing, an efficient hierarchical routing mechanism that activates multiple experts in a same device (GPU).

model

Ego4D: Around the World in 3,000 Hours of Egocentric Video

8 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

1 code implementation NAACL 2022 Xiangyang Liu, Tianxiang Sun, Junliang He, Jiawen Wu, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu

ELUE is dedicated to depict the Pareto Frontier for various language understanding tasks, such that it can tell whether and how much a method achieves Pareto improvement.

YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker

no code implementations14 Sep 2021 Ruizhi Pu, Xinyu Zhang, Ruofei Lai, Zikai Guo, Yinxia Zhang, Hao Jiang, Yongkang Wu, Yantao Jia, Zhicheng Dou, Zhao Cao

Finally, supervisory signal in rear compressor is computed based on condition probability and thus can control sample dynamic and further enhance the model performance.

Document Ranking Information Retrieval +1

Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking

1 code implementation24 Aug 2021 Yutao Zhu, Jian-Yun Nie, Zhicheng Dou, Zhengyi Ma, Xinyu Zhang, Pan Du, Xiaochen Zuo, Hao Jiang

To learn a more robust representation of the user behavior sequence, we propose a method based on contrastive learning, which takes into account the possible variations in user's behavior sequences.

Contrastive Learning Data Augmentation +1

Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need

1 code implementation20 Aug 2021 Zhengyi Ma, Zhicheng Dou, Wei Xu, Xinyu Zhang, Hao Jiang, Zhao Cao, Ji-Rong Wen

In this paper, we propose to leverage the large-scale hyperlinks and anchor texts to pre-train the language model for ad-hoc retrieval.

Language Modeling Language Modelling +1

Millimeter-Wave NR-U and WiGig Coexistence: Joint User Grouping, Beam Coordination and Power Control

no code implementations11 Aug 2021 Xiaoxia Xu, Qimei Chen, Hao Jiang, Jun Huang

Our aim for the proposed coexistence network is to maximize the spectral efficiency while ensuring the strict NR-U delay requirement and the WiGig transmission performance in real time environments.

Proactive Retrieval-based Chatbots based on Relevant Knowledge and Goals

1 code implementation18 Jul 2021 Yutao Zhu, Jian-Yun Nie, Kun Zhou, Pan Du, Hao Jiang, Zhicheng Dou

The final response is selected according to the predicted knowledge, the goal to achieve, and the context.

Multi-Task Learning Retrieval

Graph-Embedded Multi-Agent Learning for Smart Reconfigurable THz MIMO-NOMA Networks

no code implementations15 Jul 2021 Xiaoxia Xu, Qimei Chen, Xidong Mu, Yuanwei Liu, Hao Jiang

With the accelerated development of immersive applications and the explosive increment of internet-of-things (IoT) terminals, 6G would introduce terahertz (THz) massive multiple-input multiple-output non-orthogonal multiple access (MIMO-NOMA) technologies to meet the ultra-high-speed transmission and massive connectivity requirements.

Deep Reinforcement Learning

Answer Complex Questions: Path Ranker Is All You Need

3 code implementations Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021 Xinyu Zhang, Ke Zhan, Enrui Hu, Chengzhen Fu, Lan Luo, Hao Jiang, Yantao Jia, Fan Yu, Zhicheng Dou, Zhao Cao, Lei Chen

Currently, the most popular method for open-domain Question Answering (QA) adopts "Retriever and Reader" pipeline, where the retriever extracts a list of candidate documents from a large set of documents followed by a ranker to rank the most relevant documents and the reader extracts answer from the candidates.

All Open-Domain Question Answering

EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

1 code implementation9 Jul 2021 Jacob Donley, Vladimir Tourbabin, Jung-Suk Lee, Mark Broyles, Hao Jiang, Jie Shen, Maja Pantic, Vamsi Krishna Ithapu, Ravish Mehra

In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer.

Speech Enhancement

Early Exiting with Ensemble Internal Classifiers

no code implementations28 May 2021 Tianxiang Sun, Yunhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu

In this paper, we show that a novel objective function for the training of the ensemble internal classifiers can be naturally induced from the perspective of ensemble learning and information theory.

Diversity Ensemble Learning

Emotion Eliciting Machine: Emotion Eliciting Conversation Generation based on Dual Generator

no code implementations18 May 2021 Hao Jiang, Yutao Zhu, Xinyu Zhang, Zhicheng Dou, Pan Du, Te Pi, Yantao Jia

Then we propose a dual encoder-decoder structure to model the generation of responses in both positive and negative side based on the changes of the user's emotion status in the conversation.

Decoder

A Semi-Supervised Classification Method of Apicomplexan Parasites and Host Cell Using Contrastive Learning Strategy

no code implementations14 Apr 2021 Yanni Ren, Hangyu Deng, Hao Jiang, Jinglu Hu

A common shortfall of supervised learning for medical imaging is the greedy need for human annotations, which is often expensive and time-consuming to obtain.

Contrastive Learning

Egocentric Pose Estimation from Human Vision Span

no code implementations ICCV 2021 Hao Jiang, Vamsi Krishna Ithapu

Existing approaches either use a narrow field of view front facing camera that barely captures the wearer, or an extruded head-mounted top-down camera for maximal wearer visibility.

Egocentric Pose Estimation Pose Estimation

Residual-Aided End-to-End Learning of Communication System without Known Channel

no code implementations22 Feb 2021 Hao Jiang, Shuangkaisheng Bi, Linglong Dai, Hao Wang, Jiankun Zhang

However, the gradient vanishing and overfitting problems of GAN will result in the serious performance degradation of E2E learning of communication system.

Generative Adversarial Network

Market2Dish: Health-aware Food Recommendation

1 code implementation11 Dec 2020 Wenjie Wang, Ling-Yu Duan, Hao Jiang, Peiguang Jing, Xuemeng Song, Liqiang Nie

With the rising incidence of some diseases, such as obesity and diabetes, a healthy diet is arousing increasing attention.

Food recommendation Nutrition +1

Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient

no code implementations25 Jul 2020 Haonan Jia, Xiao Zhang, Jun Xu, Wei Zeng, Hao Jiang, Xiaohui Yan, Ji-Rong Wen

Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance, resulting in unstable training and poor sampling efficiency.

Q-Learning reinforcement-learning +2

Real-time 3D Deep Multi-Camera Tracking

no code implementations26 Mar 2020 Quanzeng You, Hao Jiang

Our DMCT consists of 1) a fast and novel perspective-aware Deep GroudPoint Network, 2) a fusion procedure for ground-plane occupancy heatmap estimation, 3) a novel Deep Glimpse Network for person detection and 4) a fast and accurate online tracker.

Human Detection Multi-Object Tracking

Review of data analysis in vision inspection of power lines with an in-depth discussion of deep learning technology

no code implementations22 Mar 2020 Xinyu Liu, Xiren Miao, Hao Jiang, Jing Chen

With the aim of providing a comprehensive overview for researchers who are interested in developing a deep-learning-based analysis system for power lines inspection data, this paper conducts a thorough review of the current literature and identifies the challenges for future research.

Fault Diagnosis object-detection +1

Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation

1 code implementation19 Dec 2019 Jian Peng, Bo Tang, Hao Jiang, Zhuo Li, Yinjie Lei, Tao Lin, Haifeng Li

It is due to two facts: first, as the model learns more tasks, the intersection of the low-error parameter subspace satisfying for these tasks becomes smaller or even does not exist; second, when the model learns a new task, the cumulative error keeps increasing as the model tries to protect the parameter configuration of previous tasks from interference.

Image Classification

Inertial nonconvex alternating minimizations for the image deblurring

no code implementations27 Jul 2019 Tao Sun, Roberto Barrio, Marcos Rodriguez, Hao Jiang

In image processing, Total Variation (TV) regularization models are commonly used to recover blurred images.

Articles Deblurring +2

Heavy-ball Algorithms Always Escape Saddle Points

no code implementations23 Jul 2019 Tao Sun, Dongsheng Li, Zhe Quan, Hao Jiang, Shengguo Li, Yong Dou

In this paper, we answer a question: can the nonconvex heavy-ball algorithms with random initialization avoid saddle points?

Real-time Multiple People Hand Localization in 4D Point Clouds

no code implementations5 Mar 2019 Hao Jiang, Quanzeng You

Different from the traditional multiple view approaches, which find key points in 2D and then triangulate to recover the 3D locations, our method directly processes the dynamic 3D data that involve both clutter and crowd.

Iteratively reweighted penalty alternating minimization methods with continuation for image deblurring

no code implementations9 Feb 2019 Tao Sun, Dongsheng Li, Hao Jiang, Zhe Quan

In this paper, we consider a class of nonconvex problems with linear constraints appearing frequently in the area of image processing.

Deblurring Image Deblurring

Ego-Downward and Ambient Video based Person Location Association

no code implementations2 Dec 2018 Liang Yang, Hao Jiang, Jizhong Xiao, Zhouyuan Huo

To provide a possible solution to this problem, this paper proposes a camera system with both ego-downward and third-static view to perform localization and tracking in a learning approach.

Diversity

Non-ergodic Convergence Analysis of Heavy-Ball Algorithms

no code implementations5 Nov 2018 Tao Sun, Penghang Yin, Dongsheng Li, Chun Huang, Lei Guan, Hao Jiang

For objective functions satisfying a relaxed strongly convex condition, the linear convergence is established under weaker assumptions on the step size and inertial parameter than made in the existing literature.

A Detection and Segmentation Architecture for Skin Lesion Segmentation on Dermoscopy Images

no code implementations11 Sep 2018 Chengyao Qian, Ting Liu, Hao Jiang, Zhe Wang, Pengfei Wang, Mingxin Guan, Biao Sun

This report summarises our method and validation results for the ISIC Challenge 2018 - Skin Lesion Analysis Towards Melanoma Detection - Task 1: Lesion Segmentation.

Lesion Segmentation Segmentation +1

Action4D: Real-time Action Recognition in the Crowd and Clutter

no code implementations6 Jun 2018 Quanzeng You, Hao Jiang

In this paper, we propose a real-time action recognition method, Action4D, which gives reliable and accurate results in the real-world settings.

Action Recognition Temporal Action Localization

Long short-term memory networks in memristor crossbars

1 code implementation30 May 2018 Can Li, Zhongrui Wang, Mingyi Rao, Daniel Belkin, Wenhao Song, Hao Jiang, Peng Yan, Yunning Li, Peng Lin, Miao Hu, Ning Ge, John Paul Strachan, Mark Barnell, Qing Wu, R. Stanley Williams, J. Joshua Yang, Qiangfei Xia

Recent breakthroughs in recurrent deep neural networks with long short-term memory (LSTM) units has led to major advances in artificial intelligence.

Emerging Technologies Applied Physics

A convergence framework for inexact nonconvex and nonsmooth algorithms and its applications to several iterations

no code implementations12 Sep 2017 Tao Sun, Hao Jiang, Li-Zhi Cheng, Wei Zhu

In fact, a lot of classical inexact nonconvex and nonsmooth algorithms allow these three conditions.

Iteratively Linearized Reweighted Alternating Direction Method of Multipliers for a Class of Nonconvex Problems

no code implementations1 Sep 2017 Tao Sun, Hao Jiang, Lizhi Cheng, Wei Zhu

The traditional alternating direction method of multipliers encounters troubles in both mathematics and computations in solving the nonconvex and nonsmooth subproblem.

Research on Bi-mode Biometrics Based on Deep Learning

no code implementations16 May 2017 Hao Jiang

In view of the fact that biological characteristics have excellent independent distinguishing characteristics, biometric identification technology involves almost all the relevant areas of human distinction.

Deep Learning

Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly

no code implementations CVPR 2017 Hao Jiang, Kristen Grauman

In addition, we demonstrate its impact on a proxemics recognition task, which demands a precise representation of "whose body part is where" in crowded images.

Human Detection Semantic Segmentation

Matching Bags of Regions in RGBD images

no code implementations CVPR 2015 Hao Jiang

We study the new problem of matching regions between a pair of RGBD images given a large set of overlapping region proposals.

A Linear Approach to Matching Cuboids in RGBD Images

no code implementations CVPR 2013 Hao Jiang, Jianxiong Xiao

We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect.

Cannot find the paper you are looking for? You can Submit a new open access paper.