Search Results for author: Fan Zhang

Found 303 papers, 83 papers with code

软件标识符的自然语言规范性研究(Research on the Natural Language Normalness of Software Identifiers)

no code implementations CCL 2021 Dongzhen Wen, Fan Zhang, Xiao Zhang, Liang Yang, Yuan Lin, Bo Xu, Hongfei Lin

“软件源代码的理解则是软件协同开发与维护的核心, 而源代码中占半数以上的标识符的理解则在软件理解中起到重要作用, 传统软件工程主要研究通过命名规范限制标识符的命名过程以构造更易理解和交流的标识符。本文则在梳理分析常见编程语言命名规范的基础上, 提出一种全新的标识符可理解性评价标准。具体而言, 本文首先总结梳理了常见主流编程语言中的命名规范并类比自然语言语素概念本文提出基于软件语素的标识符构成过程, 即标识符的构成可被视为软件语素的生成、排列和连接过程。在此基础上, 本文提出一种结合自然语料库的软件标识符规范性评价方法, 用来衡量软件标识符是否易于理解。最后, 本文通过源代码理解数据集和乇乩乴乨乵乢平台中开源项目对规范性指标进行了验证性实验, 结果表明本文提出的规范性分数能够很好衡量软件项目的可理解性。”

Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits

no code implementations ICML 2020 Nian Si, Fan Zhang, Zhengyuan Zhou, Jose Blanchet

We first present a policy evaluation procedure in the ambiguous environment and also give a heuristic algorithm to solve the distributionally robust policy learning problems efficiently.

Multi-Armed Bandits

EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery

1 code implementation20 Jan 2025 Guankun Wang, Long Bai, Junyi Wang, Kun Yuan, Zhen Li, Tianxu Jiang, Xiting He, Jinlin Wu, Zhen Chen, Zhen Lei, Hongbin Liu, Jiazheng Wang, Fan Zhang, Nicolas Padoy, Nassir Navab, Hongliang Ren

Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense potential in computer-aided diagnosis and decision-making.

UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion

no code implementations20 Jan 2025 Zixuan Chen, Yujin Wang, Xin Cai, Zhiyuan You, Zheming Lu, Fan Zhang, Shi Guo, Tianfan Xue

In this work, we propose UltraFusion, the first exposure fusion technique that can merge input with 9 stops differences.

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing

no code implementations12 Jan 2025 Ruizhe Ou, Yuan Hu, Fan Zhang, Jiaxin Chen, Yu Liu

In addition, to address the absence of large-scale datasets for training pixel-level RS MLLMs, we construct the GeoPixInstruct dataset, comprising 65, 463 images and 140, 412 instances, with each instance annotated with text descriptions, bounding boxes, and masks.

Image Captioning Language Modeling +7

X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding

no code implementations12 Jan 2025 Wenqi Zhou, Kai Cao, Hao Zheng, Xinyi Zheng, Miao Liu, Per Ola Kristensson, Walterio Mayol-Cuevas, Fan Zhang, Weizhe Lin, Junxiao Shen

Leveraging the advanced text processing capabilities of large language models (LLMs), X-LeBench develops a life-logging simulation pipeline that produces realistic, coherent daily plans aligned with real-world video data.

Video Understanding

FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models

no code implementations6 Jan 2025 Zhuo Chen, Yuyang Gong, Miaokun Chen, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu, Jiawei Liu

By leveraging instruction engineering, we obtain partial retrieval model outputs from black-box RAG system, facilitating the training of surrogate models to enhance the effectiveness of opinion manipulation attack.

Hallucination Question Answering +2

Artificial Intelligence in Creative Industries: Advances Prior to 2025

no code implementations6 Jan 2025 Nantheera Anantrasirichai, Fan Zhang, David Bull

This paper explores the significant technological shifts since our previous review in 2022, highlighting how these developments have expanded creative opportunities and efficiency.

Data Compression multimodal generation +1

SGTC: Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation

1 code implementation20 Dec 2024 Ke Yan, Qing Cai, Fan Zhang, Ziyan Cao, Zhi Liu

To address these issues, we propose a novel Semantic-Guided Triplet Co-training (SGTC) framework, which achieves high-end medical image segmentation by only annotating three orthogonal slices of a few volumetric samples, significantly alleviating the burden of radiologists.

Auxiliary Learning Image Segmentation +4

Efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its Deployment

1 code implementation15 Dec 2024 Haisheng Lu, Yujie Fu, Fan Zhang, Le Zhang

Medical image segmentation is a critical component of clinical practice, and the state-of-the-art MedSAM model has significantly advanced this field.

Image Segmentation Medical Image Segmentation +2

Efficient Gravitational Wave Parameter Estimation via Knowledge Distillation: A ResNet1D-IAF Approach

no code implementations11 Dec 2024 Xihua Zhu, Yiqian Yang, Fan Zhang

With the rapid development of gravitational wave astronomy, the increasing number of detected events necessitates efficient methods for parameter estimation and model updates.

Astronomy Computational Efficiency +2

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

1 code implementation10 Dec 2024 Fan Zhang, Shulin Tian, Ziqi Huang, Yu Qiao, Ziwei Liu

Moreover, existing evaluation methods rely on rigid pipelines that overlook specific user needs and provide numerical results without clear explanations.

Video Generation

Adaptive Epsilon Adversarial Training for Robust Gravitational Wave Parameter Estimation Using Normalizing Flows

no code implementations10 Dec 2024 Yiqian Yang, Xihua Zhu, Fan Zhang

Adversarial training with Normalizing Flow (NF) models is an emerging research area aimed at improving model robustness through adversarial samples.

HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution

no code implementations4 Dec 2024 YuXuan Jiang, Ho Man Kwan, Tianhao Peng, Ge Gao, Fan Zhang, Xiaoqing Zhu, Joel Sole, David Bull

Recent advances in implicit neural representations (INRs) have shown significant promise in modeling visual signals for various low-vision tasks including image super-resolution (ISR).

Image Super-Resolution

DiM-Gestor: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2

no code implementations23 Nov 2024 Fan Zhang, Siyuan Zhao, Naye Ji, Zhaohan Wang, Jingmei Wu, Fuxing Gao, Zhenqing Ye, Leyao Yan, Lanxin Dai, Weidong Geng, Xin Lyu, Bozuo Zhao, Dingguo Yu, Hui Du, Bin Hu

DiM-Gestor features a dual-component framework: (1) a fuzzy feature extractor and (2) a speech-to-gesture mapping module, both built on the Mamba-2.

Gesture Generation Mamba

RTSR: A Real-Time Super-Resolution Model for AV1 Compressed Content

no code implementations20 Nov 2024 YuXuan Jiang, Jakub Nawała, Chen Feng, Fan Zhang, Xiaoqing Zhu, Joel Sole, David Bull

To address this issue, this paper proposes a low-complexity SR method, RTSR, designed to enhance the visual quality of compressed video content, focusing on resolution up-scaling from a) 360p to 1080p and from b) 540p to 4K.

4k Knowledge Distillation +3

AsynEIO: Asynchronous Monocular Event-Inertial Odometry Using Gaussian Process Regression

no code implementations19 Nov 2024 Zhixiang Wang, Xudong Li, Yizhai Zhang, Fan Zhang, Panfeng

Event cameras, when combined with inertial sensors, show significant potential for motion estimation in challenging scenarios, such as high-speed maneuvers and low-light environments.

Motion Estimation regression

BVI-CR: A Multi-View Human Dataset for Volumetric Video Compression

no code implementations17 Nov 2024 Ge Gao, Adrian Azzarelli, Ho Man Kwan, Nantheera Anantrasirichai, Fan Zhang, Oliver Moolan-Feroze, David Bull

However, the development and validation of efficient 3D data compression methods are constrained by the lack of comprehensive and high-quality volumetric video datasets, which typically require much more effort to acquire and consume increased resources compared to 2D image and video databases.

3D Reconstruction Data Compression +1

Human-inspired Perspectives: A Survey on AI Long-term Memory

no code implementations1 Nov 2024 Zihong He, Weizhe Lin, Hao Zheng, Fan Zhang, Matt W. Jones, Laurence Aitchison, Xuhai Xu, Miao Liu, Per Ola Kristensson, Junxiao Shen

With the rapid advancement of AI systems, their abilities to store, retrieve, and utilize information over the long term - referred to as long-term memory - have become increasingly significant.

Survey

AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection

no code implementations30 Oct 2024 Yujin Wang, Tianyi Xu, Fan Zhang, Tianfan Xue, Jinwei Gu

Based on this, AdaptiveISP utilizes deep reinforcement learning to automatically generate an optimal ISP pipeline and the associated ISP parameters to maximize the detection performance.

Deep Reinforcement Learning object-detection +1

TractShapeNet: Efficient Multi-Shape Learning with 3D Tractography Point Clouds

1 code implementation29 Oct 2024 Yui Lo, Yuqian Chen, Dongnan Liu, Jon Haitz Legarreta, Leo Zekelman, Fan Zhang, Jarrett Rushmore, Yogesh Rathi, Nikos Makris, Alexandra J. Golby, Weidong Cai, Lauren J. O'Donnell

In this work, we investigate the possibility of utilizing a deep learning model to compute shape measures of the brain's white matter connections.

Resolution Enhancement of Under-sampled Photoacoustic Microscopy Images using Implicit Neural Representations

no code implementations15 Oct 2024 Youshen Xiao, Sheng Liao, Xuanyang Tian, Fan Zhang, Xinlong Dong, Yunhui Jiang, Xiyu Chen, Ruixi Sun, Yuyao Zhang, Fei Gao

Acoustic-Resolution Photoacoustic Microscopy (AR-PAM) is promising for subcutaneous vascular imaging, but its spatial resolution is constrained by the Point Spread Function (PSF).

SSIM

Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation

no code implementations11 Oct 2024 Chen Xu, Qiming Huang, Yuqi Hou, Jiangxing Wu, Fan Zhang, Hyung Jin Chang, Jianbo Jiao

Medical image segmentation poses challenges due to domain gaps, data modality variations, and dependency on domain knowledge or experts, especially for low- and middle-income countries (LMICs).

General Knowledge Image Segmentation +3

UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction

no code implementations2 Oct 2024 Haoran Wang, Nantheera Anantrasirichai, Fan Zhang, David Bull

3D Gaussian splatting (3DGS) offers the capability to achieve real-time high quality 3D scene rendering.

DualDn: Dual-domain Denoising via Differentiable ISP

1 code implementation27 Sep 2024 Ruikang Li, Yujin Wang, Shiqi Chen, Fan Zhang, Jinwei Gu, Tianfan Xue

The raw domain denoising adapts to sensor-specific noise as well as spatially varying noise levels, while the sRGB domain denoising adapts to ISP variations and removes residual noise amplified by the ISP.

Image Denoising

Emu3: Next-Token Prediction is All You Need

2 code implementations27 Sep 2024 Xinlong Wang, Xiaosong Zhang, Zhengxiong Luo, Quan Sun, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze Wang, Zhen Li, Qiying Yu, Yingli Zhao, Yulong Ao, Xuebin Min, Tao Li, Boya Wu, Bo Zhao, BoWen Zhang, Liangdong Wang, Guang Liu, Zheqi He, Xi Yang, Jingjing Liu, Yonghua Lin, Tiejun Huang, Zhongyuan Wang

While next-token prediction is considered a promising path towards artificial general intelligence, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e. g., Stable Diffusion) and compositional approaches (e. g., CLIP combined with LLMs).

Visual Question Answering

Cloud Adversarial Example Generation for Remote Sensing Image Classification

no code implementations21 Sep 2024 Fei Ma, Yuqiang Feng, Fan Zhang, Yongsheng Zhou

Common Perlin noise based cloud generation is a random, non-optimizable process, which cannot be directly used to attack the target models.

Adversarial Attack Adversarial Defense +2

NVRC: Neural Video Representation Compression

no code implementations11 Sep 2024 Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

In this paper, rather than focusing on representation architectures as in many existing works, we propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC), targeting compression of the representation.

Model Compression Quantization +1

Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery

no code implementations9 Sep 2024 Fan Zhang, Lingling Li, Licheng Jiao, Xu Liu, Fang Liu, Shuyuan Yang, Biao Hou

In a series of FPN experiments on the scale-preferred tasks, we found that the ``divide-and-conquer'' idea of FPN severely hampers the detector's learning in the right direction due to the large number of large-scale negative samples and interference from background noise.

object-detection Object Detection

Transmit Beamforming Design for ISAC with Stacked Intelligent Metasurfaces

no code implementations5 Sep 2024 Shunyu Li, Fan Zhang, Tianqi Mao, Rui Na, Zhaocheng Wang, George K. Karagiannidis

This paper proposes a transmit beamforming strategy for the integrated sensing and communication (ISAC) systems enabled by the novel stacked intelligent metasurface (SIM) architecture, where the base station (BS) simultaneously performs downlink communication and radar target detection via different beams.

Affordance-based Robot Manipulation with Flow Matching

no code implementations2 Sep 2024 Fan Zhang, Michael Gienger

We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model.

Robot Manipulation

PNVC: Towards Practical INR-based Video Compression

no code implementations2 Sep 2024 Ge Gao, Ho Man Kwan, Fan Zhang, David Bull

Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance.

Video Compression

When Diffusion MRI Meets Diffusion Model: A Novel Deep Generative Model for Diffusion MRI Generation

no code implementations23 Aug 2024 Xi Zhu, Wei zhang, Yijie Li, Lauren J. O'Donnell, Fan Zhang

This achievement underscores a substantial progression in enhancing dMRI quality, highlighting the potential of our novel generative approach to revolutionize dMRI imaging standards.

BVI-UGC: A Video Quality Database for User-Generated Content Transcoding

no code implementations13 Aug 2024 Zihao Qi, Chen Feng, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull

Based on this collected subjective data, we benchmarked the performance of 10 full-reference and 11 no-reference quality metrics.

Video Quality Assessment

Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering

no code implementations10 Aug 2024 Fan Zhang, Ziyue Ji, Weiguang Kang, Weiqing Li, Zhiyong Su

Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints.

3D Reconstruction Single-View 3D Reconstruction

Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration

no code implementations9 Aug 2024 Siyue Teng, YuXuan Jiang, Ge Gao, Fan Zhang, Thomas Davis, Zoe Liu, David Bull

Recent advances in video compression have seen significant coding performance improvements with the development of new standards and learning-based video codecs.

Benchmarking Video Compression

BVI-AOM: A New Training Dataset for Deep Video Compression Optimization

1 code implementation6 Aug 2024 Jakub Nawała, YuXuan Jiang, Fan Zhang, Xiaoqing Zhu, Joel Sole, David Bull

Deep learning is now playing an important role in enhancing the performance of conventional hybrid video codecs.

Video Compression

DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework

no code implementations1 Aug 2024 Fan Zhang, Naye Ji, Fuxing Gao, Bozuo Zhao, Jingmei Wu, Yanbing Jiang, Hui Du, Zhenqing Ye, Jiayang Zhu, WeiFan Zhong, Leyao Yan, Xiaomeng Ma

Speech-driven gesture generation is an emerging domain within virtual human creation, where current methods predominantly utilize Transformer-based architectures that necessitate extensive memory and are characterized by slow inference speeds.

Gesture Generation Mamba

Diffusion Feedback Helps CLIP See Better

1 code implementation29 Jul 2024 Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang

We demonstrate that DIVA improves CLIP's performance on the challenging MMVP-VLM benchmark which assesses fine-grained visual abilities to a large extent (e. g., 3-7%), and enhances the performance of MLLMs and vision models on multimodal understanding and segmentation tasks.

Image Classification

Deep multimodal saliency parcellation of cerebellar pathways: linking microstructure and individual function through explainable multitask learning

no code implementations21 Jul 2024 Ari Tchetchenian, Leo Zekelman, Yuqian Chen, Jarrett Rushmore, Fan Zhang, Edward H. Yeterian, Nikos Makris, Yogesh Rathi, Erik Meijering, Yang song, Lauren J. O'Donnell

We refer to our method as Deep Multimodal Saliency Parcellation (DeepMSP), as it computes the saliency of structural measures for predicting cognitive and motor functional performance, with these saliencies being applied to the task of parcellation.

Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models

no code implementations18 Jul 2024 Zhuo Chen, Jiawei Liu, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu

Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks.

Decision Making Hallucination +2

TractGraphFormer: Anatomically Informed Hybrid Graph CNN-Transformer Network for Classification from Diffusion MRI Tractography

no code implementations11 Jul 2024 Yuqian Chen, Fan Zhang, Meng Wang, Leo R. Zekelman, Suheyla Cetin-Karayumak, Tengfei Xue, Chaoyi Zhang, Yang song, Nikos Makris, Yogesh Rathi, Weidong Cai, Lauren J. O'Donnell

The proposed approach highlights the potential of integrating local anatomical information and global feature dependencies to improve prediction performance in machine learning with diffusion MRI tractography.

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

1 code implementation11 Jul 2024 Xiaotong Li, Fan Zhang, Haiwen Diao, Yueze Wang, Xinlong Wang, Ling-Yu Duan

To facilitate the cutting-edge research of MLLMs on comprehensive vision perception, we thereby propose Perceptual Fusion, using a low-budget but highly effective caption engine for complete and accurate image descriptions.

Visual Question Answering

DaBiT: Depth and Blur informed Transformer for Joint Refocusing and Super-Resolution

no code implementations1 Jul 2024 Crispian Morris, Nantheera Anantrasirichai, Fan Zhang, David Bull

In many real-world scenarios, recorded videos suffer from accidental focus blur, and while video deblurring methods exist, most specifically target motion blur.

Deblurring Video Deblurring +4

GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold Networks

1 code implementation19 Jun 2024 Fan Zhang, Xin Zhang

Massive number of applications involve data with underlying relationships embedded in non-Euclidean space.

Kolmogorov-Arnold Networks

Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking

no code implementations13 Jun 2024 Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan

The dataset offers over 833 minutes (more than 3. 7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects.

Object Tracking

MVAD: A Multiple Visual Artifact Detector for Video Streaming

no code implementations31 May 2024 Chen Feng, Duolikun Danier, Fan Zhang, Alex Mackin, Andrew Collins, David Bull

In this paper, we propose a Multiple Visual Artifact Detector, MVAD, for video streaming which, for the first time, is able to detect multiple artifacts using a single framework that is not reliant on video quality assessment models.

Data Augmentation Video Quality Assessment

FedMAP: Unlocking Potential in Personalized Federated Learning through Bi-Level MAP Optimization

1 code implementation29 May 2024 Fan Zhang, Carlos Esteve-Yagüe, Sören Dittmer, Carola-Bibiane Schönlieb, Michael Roberts

This study contributes to PFL by establishing a solid theoretical foundation for the proposed method and offering a robust, ready-to-use framework that effectively addresses the challenges posed by non-IID data in FL.

Personalized Federated Learning

EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy

1 code implementation21 May 2024 Yihong Huang, Yuang Zhang, Liping Wang, Fan Zhang, Xuemin Lin

Most deep UOD models are trained exclusively on clean datasets to learn the distribution of the normal data, which requires huge manual efforts to clean the real-world data if possible.

Outlier Detection

DEMO: A Statistical Perspective for Efficient Image-Text Matching

no code implementations19 May 2024 Fan Zhang, Xian-Sheng Hua, Chong Chen, Xiao Luo

Image-text matching has been a long-standing problem, which seeks to connect vision and language through semantic understanding.

Image-text matching Model Optimization +3

RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content

no code implementations14 May 2024 Tianhao Peng, Chen Feng, Duolikun Danier, Fan Zhang, Benoit Vallade, Alex Mackin, David Bull

The proposed method, RMT-BVQA, has been evaluated on the VDPVE (VQA Dataset for Perceptual Video Enhancement) database through a five-fold cross validation.

Contrastive Learning Video Enhancement +2

Minimal Evidence Group Identification for Claim Verification

no code implementations24 Apr 2024 Xiangci Li, Sihao Chen, Rajvi Kapadia, Jessica Ouyang, Fan Zhang

Claim verification in real-world settings (e. g. against a large collection of candidate evidences retrieved from the web) typically requires identifying and aggregating a complete set of evidence pieces that collectively provide full support to the claim.

Claim Verification

Disturbance Rejection-Guarded Learning for Vibration Suppression of Two-Inertia Systems

no code implementations16 Apr 2024 Fan Zhang, Jinfeng Chen, Yu Hu, Zhiqiang Gao, Ge Lv, Qin Lin

On the other hand, machine learning benefits from an additional assurance layer provided by the ESO, as any imperfections in the machine learning model can be compensated for by the ESO.

MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution

no code implementations15 Apr 2024 YuXuan Jiang, Chen Feng, Fan Zhang, David Bull

Knowledge distillation (KD) has emerged as a promising technique in deep learning, typically employed to enhance a compact student network through learning from their high-performance but more complex teacher variant.

Image Super-Resolution Knowledge Distillation

A diffusion MRI tractography atlas for concurrent white matter mapping across Eastern and Western populations

no code implementations6 Apr 2024 Yijie Li, Wei zhang, Ye Wu, Li Yin, Ce Zhu, Yuqian Chen, Suheyla Cetin-Karayumak, Kang Ik K Cho, Leo R. Zekelman, Jarrett Rushmore, Yogesh Rathi, Nikos Makris, Lauren J. O'Donnell, Fan Zhang

However, a comprehensive investigation into WM fiber tracts between Eastern and Western populations is challenged due to the lack of a cross-population WM atlas and the large site-specific variability of dMRI data.

Unifying Lane-Level Traffic Prediction from a Graph Structural Perspective: Benchmark and Baseline

1 code implementation22 Mar 2024 Shuhao Li, Yue Cui, Jingyi Xu, Libin Li, Lingkai Meng, Weidong Yang, Fan Zhang, Xiaofang Zhou

Traffic prediction has long been a focal and pivotal area in research, witnessing both significant strides from city-level to road-level predictions in recent years.

Autonomous Driving Traffic Prediction

Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier

1 code implementation14 Mar 2024 Fan Zhang, Wei Qin, Weijieying Ren, Lei Wang, Zetong Chen, Richang Hong

Additionally, We find that most of the solutions to long-tailed problems are still biased towards head classes in the end, and we propose a simple and post hoc prediction re-balancing strategy to further mitigate the basis toward head class.

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

no code implementations CVPR 2024 QiHao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, Jun Liu

Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories.

Immersive Video Compression using Implicit Neural Representations

1 code implementation2 Feb 2024 Ho Man Kwan, Fan Zhang, Andrew Gower, David Bull

In this paper we, for the first time, extend their application to immersive (multi-view) videos, by proposing MV-HiNeRV, a new INR-based immersive video codec.

Video Compression

3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework

no code implementations14 Jan 2024 Fan Zhang, Shuyi Mao, Qing Li, Xiaojiang Peng

Comparative evaluations with popular point-based methods on HPoint103 and the public dataset DHP19 demonstrate the dramatic outperformance of our D-CPT.

Decoder Pose Estimation +1

MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for Facial Expression Recognition

no code implementations14 Jan 2024 Fan Zhang, Xiaobao Guo, Xiaojiang Peng, Alex Kot

In addition, when compared with the domain disparity existing between face datasets and FER datasets, the divergence between general datasets and FER datasets is more pronounced.

Contrastive Learning Face Recognition +3

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

2 code implementations10 Jan 2024 Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, Rui Kong, Yile Wang, Hanfei Geng, Jian Luan, Xuefeng Jin, Zilong Ye, Guanjing Xiong, Fan Zhang, Xiang Li, Mengwei Xu, Zhijun Li, Peng Li, Yang Liu, Ya-Qin Zhang, Yunxin Liu

Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.

Task Planning

Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval

no code implementations CVPR 2024 Fan Zhang, Xian-Sheng Hua, Chong Chen, Xiao Luo

In this paper we propose a semi-supervised approach named Fine-grained Prototypcical Voting with Heterogeneous Mixup (FIVE) which maps both 2D and 3D data into a common embedding space for cross-modal retrieval.

Cross-Modal Retrieval Retrieval

Compressing Deep Image Super-resolution Models

no code implementations31 Dec 2023 YuXuan Jiang, Jakub Nawala, Fan Zhang, David Bull

Deep learning techniques have been applied in the context of image super-resolution (SR), achieving remarkable advances in terms of reconstruction performance.

Image Super-Resolution Knowledge Distillation

Emage: Non-Autoregressive Text-to-Image Generation

no code implementations22 Dec 2023 Zhangyin Feng, Runyi Hu, Liangxin Liu, Fan Zhang, Duyu Tang, Yong Dai, Xiaocheng Feng, Jiwei Li, Bing Qin, Shuming Shi

Compared with autoregressive baselines that needs to run one thousand times, our model only runs 16 times to generate images of competitive quality with an order of magnitude lower inference latency.

Denoising Text-to-Image Generation

GreenScan: Towards large-scale terrestrial monitoring the health of urban trees using mobile sensing

no code implementations22 Dec 2023 Akshit Gupta, Simone Mora, Fan Zhang, Martine Rutten, R. Venkatesha Prasad, Carlo Ratti

Healthy urban greenery is a fundamental asset to mitigate climate change phenomena such as extreme heat and air pollution.

Generative Multimodal Models are In-Context Learners

1 code implementation CVPR 2024 Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang

The human ability to easily solve multimodal tasks in context (i. e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate.

In-Context Learning Personalized Image Generation +3

Full-reference Video Quality Assessment for User Generated Content Transcoding

no code implementations19 Dec 2023 Zihao Qi, Chen Feng, Duolikun Danier, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull

In this work, we observe that existing full-/no-reference quality metrics fail to accurately predict the perceptual quality difference between transcoded UGC content and the corresponding unpristine references.

Video Quality Assessment Visual Question Answering (VQA)

Device Scheduling for Relay-assisted Over-the-Air Aggregation in Federated Learning

no code implementations15 Dec 2023 Fan Zhang, Jining Chen, Kunlun Wang, Wen Chen

we formulate a joint device scheduling, and power allocation problem to maximize the number of scheduled devices.

Federated Learning Scheduling

BVI-Artefact: An Artefact Detection Benchmark Dataset for Streamed Videos

no code implementations14 Dec 2023 Chen Feng, Duolikun Danier, Fan Zhang, Alex Mackin, Andy Collins, David Bull

Professionally generated content (PGC) streamed online can contain visual artefacts that degrade the quality of user experience.

RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment

no code implementations14 Dec 2023 Chen Feng, Duolikun Danier, Haoran Wang, Fan Zhang, Benoit Vallade, Alex Mackin, David Bull

Deep learning-based video quality assessment (deep VQA) has demonstrated significant potential in surpassing conventional metrics, with promising improvements in terms of correlation with human perception.

Knowledge Distillation Model Compression +2

A Simple Framework to Enhance the Adversarial Robustness of Deep Learning-based Intrusion Detection System

no code implementations6 Dec 2023 Xinwei Yuan, Shu Han, Wei Huang, Hongliang Ye, Xianglong Kong, Fan Zhang

In this paper, we propose a novel IDS architecture that can enhance the robustness of IDS against adversarial attacks by combining conventional machine learning (ML) models and Deep Learning models.

Adversarial Attack Adversarial Robustness +2

Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation

no code implementations5 Dec 2023 Tianhao Peng, Ge Gao, Heming Sun, Fan Zhang, David Bull

In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency.

Decoder Video Compression

How does spatial structure affect psychological restoration? A method based on Graph Neural Networks and Street View Imagery

1 code implementation29 Nov 2023 Haoran Ma, Yan Zhang, Pengyuan Liu, Fan Zhang, Pengyu Zhu

In this work, a spatial-dependent graph neural networks (GNNs) approach is proposed to reveal the relation between spatial structure and restoration quality on an urban scale.

VBench: Comprehensive Benchmark Suite for Video Generative Models

1 code implementation CVPR 2024 Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

Image Generation Video Generation

A Novel Deep Clustering Framework for Fine-Scale Parcellation of Amygdala Using dMRI Tractography

no code implementations25 Nov 2023 Haolin He, Ce Zhu, Le Zhang, Yipeng Liu, Xiao Xu, Yuqian Chen, Leo Zekelman, Jarrett Rushmore, Yogesh Rathi, Nikos Makris, Lauren J. O'Donnell, Fan Zhang

The amygdala plays a vital role in emotional processing and exhibits structural diversity that necessitates fine-scale parcellation for a comprehensive understanding of its anatomico-functional correlations.

Clustering Deep Clustering +2

Cross-Domain Dual-Functional OFDM Waveform Design for Accurate Sensing/Positioning

no code implementations8 Nov 2023 Fan Zhang, Tianqi Mao, Ruiqi Liu, Zhu Han, Sheng Chen, Zhaocheng Wang

For the communication-centric design, to maximize the achievable data rate, a fraction of REs are optimally allocated for communications according to prior knowledge of the communication channel.

CapsFusion: Rethinking Image-Text Data at Scale

1 code implementation CVPR 2024 Qiying Yu, Quan Sun, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Yue Cao, Xinlong Wang, Jingjing Liu

To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions.

World Knowledge

Multi-task deep learning for large-scale building detail extraction from high-resolution satellite imagery

1 code implementation29 Oct 2023 Zhen Qian, Min Chen, Zhuo Sun, Fan Zhang, Qingsong Xu, Jinzhao Guo, Zhiwei Xie, Zhixin Zhang

Understanding urban dynamics and promoting sustainable development requires comprehensive insights about buildings.

Planning with Logical Graph-based Language Model for Instruction Generation

no code implementations26 Aug 2023 Fan Zhang, Kebing Jin, Hankz Hankui Zhuo

Despite the superior performance of large language models to generate natural language texts, it is hard to generate texts with correct logic according to a given task, due to the difficulties for neural models to capture implied rules from free-form texts.

Language Modeling Language Modelling +2

MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition

1 code implementation ICCV 2023 QiHao Zhao, Chen Jiang, Wei Hu, Fan Zhang, Jun Liu

In the analysis and ablation study, we demonstrate that our method compared with previous work can effectively increase the diversity of experts, significantly reduce the variance of the model, and improve recognition accuracy.

Diversity Long-tail Learning

Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model

no code implementations11 Aug 2023 Fan Zhang, Naye Ji, Fuxing Gao, Siyuan Zhao, Zhaohan Wang, Shunman Li

Firstly, considering that speech audio not only contains acoustic and semantic features but also conveys personality traits, emotions, and more subtle information related to accompanying gestures, we pioneer the adaptation of WavLM, a large-scale pre-trained model, to extract low-level and high-level audio information.

Gesture Generation

Deep neural networks from the perspective of ergodic theory

no code implementations4 Aug 2023 Fan Zhang

The design of deep neural networks remains somewhat of an art rather than precise science.

TractCloud: Registration-free tractography parcellation with a novel local-global streamline point cloud representation

no code implementations18 Jul 2023 Tengfei Xue, Yuqian Chen, Chaoyi Zhang, Alexandra J. Golby, Nikos Makris, Yogesh Rathi, Weidong Cai, Fan Zhang, Lauren J. O'Donnell

TractCloud achieves efficient and consistent whole-brain white matter parcellation across the lifespan (from neonates to elderly subjects, including brain tumor patients) without the need for registration.

Anatomy

Data-Driven Optimal Control of Tethered Space Robot Deployment with Learning Based Koopman Operator

no code implementations15 Jul 2023 Ao Jin, Fan Zhang, Panfeng Huang

To avoid complex constraints of the traditional nonlinear method for tethered space robot (TSR) deployment, this paper proposes a data-driven optimal control framework with an improved deep learning based Koopman operator that could be applied to complex environments.

ATWM: Defense against adversarial malware based on adversarial training

no code implementations11 Jul 2023 Kun Li, Fan Zhang, Wei Guo

In order to defend against malware attacks, researchers have proposed many Windows malware detection models based on deep learning.

Adversarial Defense Deep Learning +1

Emu: Generative Pretraining in Multimodality

2 code implementations11 Jul 2023 Quan Sun, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context.

Image Captioning Image to text +5

TractGeoNet: A geometric deep learning framework for pointwise analysis of tract microstructure to predict language assessment performance

no code implementations8 Jul 2023 Yuqian Chen, Leo R. Zekelman, Chaoyi Zhang, Tengfei Xue, Yang song, Nikos Makris, Yogesh Rathi, Alexandra J. Golby, Weidong Cai, Fan Zhang, Lauren J. O'Donnell

We evaluate the effectiveness of the proposed method by predicting individual performance on two neuropsychological assessments of language using a dataset of 20 association white matter fiber tracts from 806 subjects from the Human Connectome Project.

regression