Search Results for author: Hao Li

Found 400 papers, 175 papers with code

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

no code implementations2 Dec 2024 Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, Jinshan Pan

Despite the significant progress made by all-in-one models in universal image restoration, existing methods suffer from a generalization bottleneck in real-world scenarios, as they are mostly trained on small-scale synthetic datasets with limited degradations.

Image Restoration Incremental Learning

VLSBench: Unveiling Visual Leakage in Multimodal Safety

no code implementations29 Nov 2024 Xuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang, Jing Shao

To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i. e., the potentially risky and sensitive content in the image has been revealed in the textual query.

Visual SLAMMOT Considering Multiple Motion Models

no code implementations28 Nov 2024 Peilin Tian, Hao Li

Specifically, we propose a solution of visual SLAMMOT considering multiple motion models and validate the inherent advantages of IMM-SLAMMOT in the visual domain.

Autonomous Driving Multi-Object Tracking +1

DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes

no code implementations19 Nov 2024 Hao Li, Yuanyuan Gao, Haosong Peng, Chenming Wu, Weicai Ye, Yufeng Zhan, Chen Zhao, Dingwen Zhang, Jingdong Wang, Junwei Han

This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes.

Novel View Synthesis

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

2 code implementations15 Nov 2024 Guowei Xu, Peng Jin, Hao Li, Yibing Song, Lichao Sun, Li Yuan

Large language models have demonstrated substantial advancements in reasoning capabilities, particularly through inference-time scaling, as illustrated by models such as OpenAI's o1.

Logical Reasoning Multimodal Reasoning +2

Motion Control for Enhanced Complex Action Video Generation

no code implementations13 Nov 2024 Qiang Zhou, Shaofeng Zhang, Nianzu Yang, Ye Qian, Hao Li

Furthermore, MVideo supports motion condition editing and composition, facilitating the generation of videos with more complex actions.

Motion Generation Video Generation

Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs

no code implementations7 Nov 2024 Chengxin Hu, Hao Li

We then explore the effect of various feature levels on performance, finding that both the quality of LLM-generated molecules and performance on different tasks benefit from different feature levels.

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

1 code implementation30 Oct 2024 Hao Li, Xiaogeng Liu

NotInject contains 339 benign samples enriched with trigger words common in prompt injection attacks, enabling fine-grained evaluation.

Benchmarking

Fidelity-Imposed Displacement Editing for the Learn2Reg 2024 SHG-BF Challenge

no code implementations28 Oct 2024 Jiacheng Wang, Xiang Chen, Renjiu Hu, Rongguang Wang, Min Liu, Yaonan Wang, Jiazheng Wang, Hao Li, Hang Zhang

Co-examination of second-harmonic generation (SHG) and bright-field (BF) microscopy enables the differentiation of tissue components and collagen fibers, aiding the analysis of human breast and pancreatic cancer tissues.

Contrastive Learning

Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification

no code implementations26 Oct 2024 Yue Su, Hao Li, Maoguo Gong

This black-box, self-supervised approach ensures the generalizability of our attack against various VI-ReID models.

Adversarial Attack

STTATTS: Unified Speech-To-Text And Text-To-Speech Model

1 code implementation24 Oct 2024 Hawau Olamide Toyin, Hao Li, Hanan Aldarmaki

Speech recognition and speech synthesis models are typically trained separately, each with its own set of learning objectives, training data, and model parameters, resulting in two distinct large networks.

Multi-Task Learning speech-recognition +3

Deep Learning-based Detection of Bacterial Swarm Motion Using a Single Image

no code implementations19 Oct 2024 Yuzhu Li, Hao Li, WeiJie Chen, Keelan O'Riordan, Neha Mani, Yuxuan Qi, Tairan Liu, Sridhar Mani, Aydogan Ozcan

It blindly achieved a sensitivity of 97. 92% and a specificity of 96. 77% for DB10, and a sensitivity of 100% and a specificity of 97. 22% for H6.

Specificity

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

1 code implementation17 Oct 2024 Rongyao Fang, Chengqi Duan, Kun Wang, Hao Li, Hao Tian, Xingyu Zeng, Rui Zhao, Jifeng Dai, Hongsheng Li, Xihui Liu

This work represents a significant step towards a truly unified MLLM capable of adapting to the granularity demands of various visual tasks.

Diversity Image Manipulation +1

Whisker-Inspired Tactile Sensing: A Sim2Real Approach for Precise Underwater Contact Tracking

no code implementations17 Oct 2024 Hao Li, Chengyi Xing, Saad Khan, Miaoya Zhong, Mark R. Cutkosky

Aquatic mammals, such as pinnipeds, utilize their whiskers to detect and discriminate objects and analyze water movements, inspiring the development of robotic whiskers for sensing contacts, surfaces, and water flows.

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

1 code implementation14 Oct 2024 Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao

This study exposes the safety vulnerabilities of Large Language Models (LLMs) in multi-turn interactions, where malicious users can obscure harmful intents across several queries.

LLM Jailbreak Safety Alignment

Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models

1 code implementation11 Oct 2024 Hao Li, Cor-Paul Bezemer, Ahmed E. Hassan

Our study not only enriches the body of knowledge on practical applications of FM4SE and SE4FM but also demonstrates the utility of FMs as a powerful and efficient approach in conducting literature surveys within technical and grey literature domains.

Code Generation

CAS-GAN for Contrast-free Angiography Synthesis

no code implementations11 Oct 2024 De-Xing Huang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Hao Li, Tian-Yu Xiang, Zeng-Guang Hou

Iodinated contrast agents are widely utilized in numerous interventional procedures, yet posing substantial health risks to patients.

Disentanglement

Compressing high-resolution data through latent representation encoding for downscaling large-scale AI weather forecast model

no code implementations10 Oct 2024 Qian Liu, Bing Gong, Xiaoran Zhuang, Xiaohui Zhong, Zhiming Kang, Hao Li

The rapid advancement of artificial intelligence (AI) in weather research has been driven by the ability to learn from large, high-dimensional datasets.

Image Compression

Enhancing Playback Performance in Video Recommender Systems with an On-Device Gating and Ranking Framework

no code implementations8 Oct 2024 Yunfei Yang, Zhenghao Qi, Honghuan Wu, Qi Song, Tieyao Zhang, Hao Li, Yimin Tu, Kaiqiao Zhan, Ben Wang

Specifically, we utilize a gate model to identify videos that may have playback issues in real-time, and then we employ a ranking model to select the optimal result from a locally-cached pool to replace the stuttering videos.

Recommendation Systems

AdaptDiff: Cross-Modality Domain Adaptation via Weak Conditional Semantic Diffusion for Retinal Vessel Segmentation

1 code implementation6 Oct 2024 Dewei Hu, Hao Li, Han Liu, Jiacheng Wang, Xing Yao, Daiwei Lu, Ipek Oguz

Subsequently, we sample on the target domain with binary vessel masks from the source domain to get paired data, i. e., target domain synthetic images conditioned on the binary vessel map.

Image Segmentation Retinal Vessel Segmentation +3

One-step Noisy Label Mitigation

1 code implementation2 Oct 2024 Hao Li, Jiayang Gu, Jingkuan Song, An Zhang, Lianli Gao

Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical.

Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval

no code implementations30 Sep 2024 Yabing Wang, Le Wang, Qiang Zhou, Zhibin Wang, Hao Li, Gang Hua, Wei Tang

Then, we take these semantic slots as internal features and leverage them to interact with the visual features.

Cross-Modal Retrieval Large Language Model +2

HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations

no code implementations28 Sep 2024 Ziyu Wang, Hao Li, Di Huang, Amir M. Rahmani

In digital healthcare, large language models (LLMs) have primarily been utilized to enhance question-answering capabilities and improve patient interactions.

Informativeness named-entity-recognition +4

Triple Point Masking

1 code implementation26 Sep 2024 Jiaming Liu, Linghe Kong, Yue Wu, Maoguo Gong, Hao Li, Qiguang Miao, Wenping Ma, Can Qin

Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation.

FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset

no code implementations23 Sep 2024 Donglin Di, He Feng, Wenzhang Sun, Yongjia Ma, Hao Li, Wei Chen, Xiaofei Gou, Tonghua Su, Xun Yang

We obtain the corresponding performance benchmarks and compared them with those trained on public datasets to demonstrate the superiority of our dataset.

Image Generation Unconditional Video Generation

Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework

no code implementations18 Sep 2024 Yuping Wu, Hao Li, Hongbo Zhu, Goran Nenadic, Xiao-jun Zeng

In this paper, we first introduce a parameter-free highlight method into the encoder-decoder framework: replacing the encoder attention mask with a saliency mask in the cross-attention module to force the decoder to focus only on salient parts of the input.

Abstractive Text Summarization Decoder

FuXi-2.0: Advancing machine learning weather forecasting model for practical applications

1 code implementation11 Sep 2024 Xiaohui Zhong, Lei Chen, Xu Fan, Wenxu Qian, Jun Liu, Hao Li

The results demonstrate that FuXi-2. 0 consistently outperforms ECMWF HRES in forecasting key meteorological variables relevant to these sectors.

Weather Forecasting

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

no code implementations10 Sep 2024 Hao Li, Dong Liang, Zheng Xie

Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks.

Meta-Learning Thompson Sampling

NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding Benchmark on Cartoon-Caption Dataset

no code implementations2 Sep 2024 Ke Chang, Hao Li, Junzhao Zhang, Yunfang Wu

Metaphor and sarcasm are common figurative expressions in people's communication, especially on the Internet or the memes popular among teenagers.

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

no code implementations29 Aug 2024 Xiangchen Yin, Donglin Di, Lei Fan, Hao Li, Chen Wei, Xiaofei Gou, Yang song, Xiao Sun, Xun Yang

Recent methods using diffusion models have made significant progress in human image generation with various additional controls such as pose priors.

Image Generation Pose Estimation

Cascaded Temporal Updating Network for Efficient Video Super-Resolution

no code implementations26 Aug 2024 Hao Li, Jiangxin Dong, Jinshan Pan

However, the key components in recurrent-based VSR networks significantly impact model efficiency, e. g., the alignment module occupies a substantial portion of model parameters, while the bidirectional propagation mechanism significantly amplifies the inference time.

Video Reconstruction Video Super-Resolution

LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction

no code implementations22 Aug 2024 Aishik Nagar, Viktor Schlegel, Thanh-Tung Nguyen, Hao Li, Yuping Wu, Kuluhan Binici, Stefan Winkler

Large Language Models (LLMs) are increasingly adopted for applications in healthcare, reaching the performance of domain experts on tasks such as question answering and document summarisation.

named-entity-recognition Named Entity Recognition +3

Dynamic Neural Dowker Network: Approximating Persistent Homology in Dynamic Directed Graphs

1 code implementation17 Aug 2024 Hao Li, Hao Jiang, Jiajun Fan, Dongsheng Ye, Liang Du

This paper introduces the Dynamic Neural Dowker Network (DNDN), a novel framework specifically designed to approximate the results of dynamic Dowker filtration, aiming to capture the high-order topological features of dynamic directed graphs.

Graph Classification Graph Neural Network +1

ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack

no code implementations10 Aug 2024 Ziyi Gao, Kai Chen, Zhipeng Wei, Tingshu Mou, Jingjing Chen, Zhiyu Tan, Hao Li, Yu-Gang Jiang

However, existing works on diffusion-based unrestricted attacks are mostly focused on images yet are seldom explored in videos.

Adversarial Attack Denoising

FuXi Weather: A data-to-forecast machine learning system for global weather

1 code implementation10 Aug 2024 Xiuyu Sun, Xiaohui Zhong, Xiaoze Xu, Yuanqing Huang, Hao Li, J. David Neelin, Deliang Chen, Jie Feng, Wei Han, Libo Wu, Yuan Qi

Weather forecasting traditionally relies on numerical weather prediction (NWP) systems that integrates global observational systems, data assimilation (DA), and forecasting models.

Computational Efficiency Weather Forecasting

PRISM Lite: A lightweight model for interactive 3D placenta segmentation in ultrasound

2 code implementations9 Aug 2024 Hao Li, Baris Oguz, Gabriel Arenas, Xing Yao, Jiacheng Wang, Alison Pouch, Brett Byram, Nadav Schwartz, Ipek Oguz

The proposed model adopts the segmentation from our fully automated model for initialization and is designed in a human-in-the-loop manner to achieve iterative improvements.

Interactive Segmentation Placenta Segmentation +1

Deep Learning-based Unsupervised Domain Adaptation via a Unified Model for Prostate Lesion Detection Using Multisite Bi-parametric MRI Datasets

no code implementations8 Aug 2024 Hao Li, Han Liu, Heinrich von Busch, Robert Grimm, Henkjan Huisman, Angela Tong, David Winkel, Tobias Penzkofer, Ivan Shabunin, Moon Hyung Choi, Qingsong Yang, Dieter Szolar, Steven Shea, Fergus Coakley, Mukesh Harisinghani, Ipek Oguz, Dorin Comaniciu, Ali Kamen, Bin Lou

This method translates diffusion-weighted imaging (DWI) acquisitions, including apparent diffusion coefficient (ADC) and individual DW images acquired using various b-values, to align with the style of images acquired using b-values recommended by Prostate Imaging Reporting and Data System (PI-RADS) guidelines.

Lesion Detection Unsupervised Domain Adaptation

VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

no code implementations5 Aug 2024 Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Hao Li

Produced through a coarse-to-fine curation strategy, this dataset guarantees high-quality videos and detailed captions with excellent temporal consistency.

Text-to-Video Generation Video Generation

Model Hijacking Attack in Federated Learning

no code implementations4 Aug 2024 Zheng Li, Siyuan Wu, Ruichuan Chen, Paarijaat Aditya, Istemi Ekin Akkus, Manohar Vanga, Min Zhang, Hao Li, Yang Zhang

Machine learning (ML), driven by prominent paradigms such as centralized and federated learning, has made significant progress in various critical applications ranging from autonomous driving to face recognition.

Autonomous Driving Data Poisoning +2

An Error Discovery and Correction for the Family of V-Shaped BPSO Algorithms

no code implementations25 Jul 2024 Qing Zhao, Chengkui Zhang, Hao Li, Ting Ke

So, traditionally, it has to rely on a low w value in the later stage to force these algorithms to converge, but also makes them quickly lose their search ability and prone to getting trapped in local optima.

Retinal IPA: Iterative KeyPoints Alignment for Multimodal Retinal Imaging

no code implementations25 Jul 2024 Jiacheng Wang, Hao Li, Dewei Hu, Rui Xu, Xing Yao, Yuankai K. Tao, Ipek Oguz

We propose a novel framework for retinal feature point alignment, designed for learning cross-modality features to enhance matching and registration across multi-modality retinal images.

Segmentation

SeqMIA: Sequential-Metric Based Membership Inference Attack

1 code implementation21 Jul 2024 Hao Li, Zheng Li, Siyuan Wu, Chengrui Hu, Yutong Ye, Min Zhang, Dengguo Feng, Yang Zhang

Building upon this signal, we introduce a novel attack method called Sequential-metric based Membership Inference Attack (SeqMIA).

Inference Attack Knowledge Distillation +1

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

no code implementations15 Jul 2024 Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

Specifically, we provide an automated method for reference local action sampling and leverage graph attention networks to assess the guiding weight of each local action in the overall motion synthesis.

Graph Attention Motion Generation +1

Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

1 code implementation11 Jul 2024 Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner.

Interactive Segmentation Model for Placenta Segmentation from 3D Ultrasound images

2 code implementations10 Jul 2024 Hao Li, Baris Oguz, Gabriel Arenas, Xing Yao, Jiacheng Wang, Alison Pouch, Brett Byram, Nadav Schwartz, Ipek Oguz

These models produce a segmentation from visual prompts provided to indicate the target region, which may offer a feasible solution for practical use.

Interactive Segmentation Placenta Segmentation +1

Adaptively Robust and Sparse K-means Clustering

1 code implementation9 Jul 2024 Hao Li, Shonosuke Sugasawa, Shota Katayama

While K-means is known to be a standard clustering algorithm, its performance may be compromised due to the presence of outliers and high-dimensional noisy variables.

Clustering

Exploring the Causality of End-to-End Autonomous Driving

1 code implementation9 Jul 2024 Jiankun Li, Hao Li, JiangJiang Liu, Zhikang Zou, Xiaoqing Ye, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

Deep learning-based models are widely deployed in autonomous driving areas, especially the increasingly noticed end-to-end solutions.

Autonomous Driving counterfactual

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality

1 code implementation7 Jul 2024 Hao Li, Gopi Krishnan Rajbahadur, Cor-Paul Bezemer

Our study is the first to show that using a non-default binding can help improve machine learning software quality from the time cost perspective compared to the default Python binding while still achieving the same level of correctness.

HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

1 code implementation30 Jun 2024 Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy.

Anatomy Image Segmentation +2

XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis

no code implementations26 Jun 2024 Hao Li, Ming Yuan, Yan Zhang, Chenming Wu, Chen Zhao, Chunyu Song, Haocheng Feng, Errui Ding, Dingwen Zhang, Jingdong Wang

To address this, this paper presents a novel driving view synthesis dataset and benchmark specifically designed for autonomous driving simulations.

Autonomous Driving Benchmarking

VDG: Vision-Only Dynamic Gaussian for Driving Simulation

no code implementations26 Jun 2024 Hao Li, Jingfeng Li, Dingwen Zhang, Chenming Wu, Jieqi Shi, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han

Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views.

Image Generation

EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models

1 code implementation24 Jun 2024 Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Mengping Yang, Cheng Zhang, Hao Li

Our evaluation across 24 text-to-image generation models demonstrate that EvalAlign not only provides superior metric stability but also aligns more closely with human preferences than existing metrics, confirming its effectiveness and utility in model assessment.

Text-to-Image Generation

Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge

1 code implementation22 Jun 2024 Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz

Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the fluorescently labeled images from label-free microscopy.

Partially Labeled Datasets

Root Cause Localization for Microservice Systems in Cloud-edge Collaborative Environments

no code implementations19 Jun 2024 Yuhan Zhu, Jian Wang, Bing Li, Xuxian Tang, Hao Li, Neng Zhang, Yuqi Zhao

Experiments conducted on the dataset collected from the benchmark show that MicroCERCL can accurately localize the root cause of microservice systems in such environments, significantly outperforming state-of-the-art approaches with an increase of at least 24. 1% in top-1 accuracy.

Graph Neural Network

Text-aware Speech Separation for Multi-talker Keyword Spotting

1 code implementation18 Jun 2024 Haoyu Li, Baochen Yang, Yu Xi, Linfeng Yu, Tian Tan, Hao Li, Kai Yu

TPDT-SS shows remarkable success in addressing permutation problems in mixed keyword speech, thereby greatly boosting the performance of the backend.

Keyword Spotting Speech Separation

Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

1 code implementation9 Jun 2024 Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua

Crucial to addressing this real-world need are event summary and persona management, which enable reasoning for appropriate long-term dialogue responses.

Response Generation Retrieval

Parameter-Inverted Image Pyramid Networks

1 code implementation6 Jun 2024 Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai

Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance.

Computational Efficiency Image Classification +2

Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation

2 code implementations5 Jun 2024 Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran Nenadic

In our work, we introduce an argument mining dataset that captures the end-to-end process of preparing an argumentative essay for a debate, which covers the tasks of claim and evidence identification (Task 1 ED), evidence convincingness ranking (Task 2 ECR), argumentative essay summarisation and human preference ranking (Task 3 ASR) and metric learning for automated evaluation of resulting essays, based on human feedback along argument quality dimensions (Task 4 SQE).

Argument Mining Metric Learning +1

Deep Learning based Performance Testing for Analog Integrated Circuits

no code implementations1 Jun 2024 Jiawei Cao, Chongtao Guo, Hao Li, Zhigang Wang, Houjun Wang, Geoffrey Ye Li

In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus.

Deep Learning

VOODOO XP: Expressive One-Shot Head Reenactment for VR Telepresence

no code implementations25 May 2024 Phong Tran, Egor Zakharov, Long-Nhat Ho, Liwen Hu, Adilbek Karmanov, Aviral Agarwal, Mclean Goldwhite, Ariana Bermudez Venegas, Anh Tuan Tran, Hao Li

We further integrate our novel head reenactment solution into an accessible high-fidelity VR telepresence system, where any person can instantly build a personalized neural head avatar from any photo and bring it to life using the headset.

Disentanglement

Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation

1 code implementation22 May 2024 Dingwen Zhang, Hao Li, Diqi He, Nian Liu, Lechao Cheng, Jingdong Wang, Junwei Han

Experimental evaluations conducted on MS COCO, Cityscapes, and CTW1500 datasets indicate that the QEIS models' performance can be significantly improved when pre-trained with our method.

Instance Segmentation Semantic Segmentation +1

An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation

1 code implementation21 May 2024 Zhiyu Tan, Mengping Yang, Luozheng Qin, Hao Yang, Ye Qian, Qiang Zhou, Cheng Zhang, Hao Li

Moreover, the model capacity of the text encoder from CLIP is relatively limited compared to Large Language Models (LLMs), which offer multilingual input, accommodate longer context, and achieve superior text representation.

Language Modelling Large Language Model +1

Local-peak scale-invariant feature transform for fast and random image stitching

no code implementations14 May 2024 Hao Li, Lipo Wang, Tianyun Zhao, Wei Zhao

Image stitching aims to construct a wide field of view with high spatial resolution, which cannot be achieved in a single exposure.

Image Stitching

Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval

no code implementations10 May 2024 Mengjia Niu, Hao Li, Jie Shi, Hamed Haddadi, Fan Mo

Large language models (LLMs) have demonstrated remarkable capabilities across various domains, although their susceptibility to hallucination poses significant challenges for their deployment in critical areas such as healthcare.

Hallucination Knowledge Graphs +1

FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting

no code implementations9 May 2024 Xiaohui Zhong, Lei Chen, Hao Li, Jun Liu, Xu Fan, Jie Feng, Kan Dai, Jing-Jia Luo, Jie Wu, Bo Lu

This innovative approach makes FuXi-ENS an advancement over the traditional ones that use L1 loss combined with the KL loss in standard VAE models for ensemble weather forecasting.

Weather Forecasting

FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

1 code implementation21 Apr 2024 Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, Jinqiao Wang

Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies directly without access to any known normal or abnormal samples within the target item categories.

Anomaly Detection Position +1

Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

no code implementations14 Apr 2024 Haosong Peng, Wei Feng, Hao Li, Yufeng Zhan, Ren Jin, Yuanqing Xia

Through extensive experiments, our findings reveal that Arena can boost inference speeds by up to 1. 58\(\times\) and 1. 82\(\times\) on average while consuming only 47\% and 31\% of the bandwidth, respectively, all with high inference accuracy.

Edge-computing

Fuxi-DA: A Generalized Deep Learning Data Assimilation Framework for Assimilating Satellite Observations

1 code implementation12 Apr 2024 Xiaoze Xu, Xiuyu Sun, Wei Han, Xiaohui Zhong, Lei Chen, Hao Li

Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance.

Weather Forecasting

Multi-level Graph Subspace Contrastive Learning for Hyperspectral Image Clustering

no code implementations8 Apr 2024 Jingxin Wang, Renxiang Guan, Kainan Gao, Zihao Li, Hao Li, Xianju Li, Chang Tang

Multi-level graph subspace contrastive learning: multi-level contrastive learning was conducted to obtain local-global joint graph representations, to improve the consistency of the positive samples between views, and to obtain more robust graph embeddings.

Clustering Contrastive Learning +1

360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System

1 code implementation8 Apr 2024 Shen Gao, Hao Li, Chengrui Huang, Quan Tu, Zhiliang Tian, Minlie Huang, Shuo Shang

The framework employs a novel 360$^\circ$ performance assessment method for multi-perspective performance evaluation with fine-grained assessment.

Language Modelling Large Language Model

Collaborative Feedback Discriminative Propagation for Video Super-Resolution

1 code implementation6 Apr 2024 Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, Jinshan Pan

However, inaccurate alignment usually leads to aligned features with significant artifacts, which will be accumulated during propagation and thus affect video restoration.

Video Reconstruction Video Restoration +1

From Two-Stream to One-Stream: Efficient RGB-T Tracking via Mutual Prompt Learning and Knowledge Distillation

no code implementations25 Mar 2024 Yang Luo, Xiqing Guo, Hao Li

Due to the complementary nature of visible light and thermal infrared modalities, object tracking based on the fusion of visible light images and thermal images (referred to as RGB-T tracking) has received increasing attention from researchers in recent years.

Knowledge Distillation Object Tracking +1

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

no code implementations20 Mar 2024 Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu

Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention.

Keyword Spotting

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

no code implementations15 Mar 2024 Hao Li, Yuanyuan Gao, Chenming Wu, Dingwen Zhang, Yalun Dai, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han

Specifically, we design a novel joint learning framework that consists of an Iterative Pose Optimization Network (IPO-Net) and a Generalizable 3D-Gaussians (G-3DG) model.

Generalizable Novel View Synthesis Novel View Synthesis

LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection

1 code implementation14 Mar 2024 Xiangrui Cai, Yang Wang, Sihan Xu, Hao Li, Ying Zhang, Zheli Liu, Xiaojie Yuan

Moreover, LAN can be also applied to post-hoc ITD, surpassing 8 competitive baselines by at least 7. 70% and 4. 03% in AUC on two datasets.

Anomaly Detection Graph structure learning

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

no code implementations CVPR 2024 QiHao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, Jun Liu

Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories.

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

no code implementations CVPR 2024 Junyan Wang, Zhenhong Sun, Zhiyu Tan, Xuanbai Chen, Weihua Chen, Hao Li, Cheng Zhang, Yang song

Vanilla text-to-image diffusion models struggle with generating accurate human images, commonly resulting in imperfect anatomies such as unnatural postures or disproportionate limbs. Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls -- human-centric priors such as pose or depth maps -- during the image generation phase.

Image Generation

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

1 code implementation CVPR 2024 Hao Li, Ying Chen, Yifei Chen, Wenxian Yang, Bowen Ding, Yuchen Han, Liansheng Wang, Rongshan Yu

It is designed to enhance the model's generalizability by leveraging the interaction between localized visual patterns and fine-grained pathological semantics.

Image Classification Language Modelling +3

ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings

no code implementations25 Feb 2024 Hao Wang, Hao Li, Minlie Huang, Lei Sha

In addition, our approach can be generalized into a broader method for generating transferable adversarial suffixes that can successfully attack multiple LLMs, even black-box LLMs, such as ChatGPT and Gemini.

Language Modelling Large Language Model

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

no code implementations22 Feb 2024 Hao Li, Mengqi Huang, Lei Zhang, Bo Hu, Yi Liu, Zhendong Mao

GAN-based image attribute editing firstly leverages GAN Inversion to project real images into the latent space of GAN and then manipulates corresponding latent codes.

Attribute

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models

no code implementations20 Feb 2024 Yizhi Li, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W. Huang, Chenghua Lin, Jie Fu

The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following.

Instruction Following

Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection

no code implementations2 Feb 2024 Hao Li, Wei Wang, Cong Wang, Zhigang Luo, Xinwang Liu, Kenli Li, Xiaochun Cao

Single-domain generalized object detection aims to enhance a model's generalizability to multiple unseen target domains using only data from a single source domain during training.

object-detection Object Detection +2

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation

1 code implementation18 Jan 2024 Zesen Cheng, Kehan Li, Hao Li, Peng Jin, Chang Liu, Xiawu Zheng, Rongrong Ji, Jie Chen

To mold instance queries to follow Brownian bridge and accomplish alignment with class texts, we design Bridge-Text Alignment (BTA) to learn discriminative bridge-level representations of instances via contrastive objectives.

Instance Segmentation Semantic Segmentation +1

Hierarchical Fashion Design with Multi-stage Diffusion Models

no code implementations15 Jan 2024 Zhifeng Xie, Hao Li, Huiming Ding, Mengtian Li, Ying Cao

Cross-modal fashion synthesis and editing offer intelligent support to fashion designers by enabling the automatic generation and local modification of design drafts. While current diffusion models demonstrate commendable stability and controllability in image synthesis, they still face significant challenges in generating fashion design from abstract design elements and fine-grained editing. Abstract sensory expressions, \eg office, business, and party, form the high-level design concepts, while measurable aspects like sleeve length, collar type, and pant length are considered the low-level attributes of clothing. Controlling and editing fashion images using lengthy text descriptions poses a difficulty. In this paper, we propose HieraFashDiff, a novel fashion design method using the shared multi-stage diffusion model encompassing high-level design concepts and low-level clothing attributes in a hierarchical structure. Specifically, we categorized the input text into different levels and fed them in different time step to the diffusion model according to the criteria of professional clothing designers. HieraFashDiff allows designers to add low-level attributes after high-level prompts for interactive editing incrementally. In addition, we design a differentiable loss function in the sampling process with a mask to keep non-edit areas. Comprehensive experiments performed on our newly conducted Hierarchical fashion dataset, demonstrate that our proposed method outperforms other state-of-the-art competitors.

Fashion Synthesis Image Generation

Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech

no code implementations12 Jan 2024 Yu Xi, Baochen Yang, Hao Li, Jiaqi Guo, Kai Yu

Furthermore, experiments on the continuous speech dataset LibriSpeech demonstrate that, by incorporating audio discrimination, CLAD achieves significant performance gain over CL without audio discrimination.

Contrastive Learning Keyword Spotting +1

Coordinated Planning of Offshore Charging Stations and Electrified Ships: A Case Study on Shanghai-Busan Maritime Route

no code implementations25 Dec 2023 Hao Li, Hanqi Tao, Wentao Huang, Hongcai Zhang, Ran Li

Despite the success of electric vehicles on land, electrification of maritime ships is challenged by the dilemma of range anxiety and cargo-carrying capacity.

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

1 code implementation22 Dec 2023 Qianrui Zhou, Hua Xu, Hao Li, Hanlei Zhang, Xiaohan Zhang, Yifan Wang, Kai Gao

To establish an optimal multimodal semantic environment for text modality, we develop a modality-aware prompting module (MAP), which effectively aligns and fuses features from text, video and audio modalities with similarity-based modality alignment and cross-modality attention mechanism.

Contrastive Learning Multimodal Intent Recognition

Diffusion-based Blind Text Image Super-Resolution

1 code implementation CVPR 2024 Yuzhe Zhang, Jiawei Zhang, Hao Li, Zhouxia Wang, Luwei Hou, Dongqing Zou, Liheng Bian

Since text prior is important to guarantee the correctness of the restored text structure according to existing arts, we also propose a Text Diffusion Model (TDM) for text recognition which can guide IDM to generate text images with correct structures.

Image Generation Image Super-Resolution

Negative Pre-aware for Noisy Cross-modal Matching

2 code implementations10 Dec 2023 Xu Zhang, Hao Li, Mang Ye

Since clean samples are easier distinguished by GMM with increasing noise, the memory bank can still maintain high quality at a high noise ratio.

Cross-modal retrieval with noisy correspondence Image-text matching +3

VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment

no code implementations CVPR 2024 Phong Tran, Egor Zakharov, Long-Nhat Ho, Anh Tuan Tran, Liwen Hu, Hao Li

We present a 3D-aware one-shot head reenactment method based on a fully volumetric neural disentanglement framework for source appearance and driver expressions.

Disentanglement Self-Supervised Learning

FreestyleRet: Retrieving Images from Style-Diversified Queries

1 code implementation5 Dec 2023 Hao Li, Curise Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan

In this paper, we propose the Style-Diversified Query-Based Image Retrieval task, which enables retrieval based on various query styles.

Image Retrieval Retrieval

Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input Recognition

1 code implementation30 Nov 2023 Guangming Zhu, Siyuan Wang, Qing Cheng, Kelong Wu, Hao Li, Liang Zhang

With the recent surge in the use of touchscreen devices, free-hand sketching has emerged as a promising modality for human-computer interaction.

class-incremental learning Class Incremental Learning +3

Novel OCT mosaicking pipeline with Feature- and Pixel-based registration

1 code implementation21 Nov 2023 Jiacheng Wang, Hao Li, Dewei Hu, Yuankai K. Tao, Ipek Oguz

High-resolution Optical Coherence Tomography (OCT) images are crucial for ophthalmology studies but are limited by their relatively narrow field of view (FoV).

Computational Efficiency

SpectralGPT: Spectral Remote Sensing Foundation Model

no code implementations13 Nov 2023 Danfeng Hong, Bing Zhang, Xuyang Li, YuXuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Paolo Gamba, Jon Atli Benediktsson, Jocelyn Chanussot

The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner.

Change Detection Representation Learning +3

InfMLLM: A Unified Framework for Visual-Language Tasks

1 code implementation12 Nov 2023 Qiang Zhou, Zhibin Wang, Wei Chu, Yinghui Xu, Hao Li, Yuan Qi

Our experiments demonstrate that preserving the positional information of visual embeddings through the pool-adapter is particularly beneficial for tasks like visual grounding.

Image Captioning Instruction Following +3