Search Results for author: Xiang Li

Found 607 papers, 264 papers with code

Gait Recognition from a Single Image using a Phase-Aware Gait Cycle Reconstruction Network

no code implementations ECCV 2020 Chi Xu, Yasushi Makihara, Xiang Li, Yasushi Yagi, Jianfeng Lu

Specifically, a phase estimation network is introduced for the input single image, and the gait cycle reconstruction network exploits the estimated phase to mitigate the dependence of an encoded feature on the phase of that single image.

Gait Recognition

Towards Robust Neural Machine Translation with Iterative Scheduled Data-Switch Training

1 code implementation COLING 2022 Zhongjian Miao, Xiang Li, Liyan Kang, Wen Zhang, Chulun Zhou, Yidong Chen, Bin Wang, Min Zhang, Jinsong Su

Most existing methods on robust neural machine translation (NMT) construct adversarial examples by injecting noise into authentic examples and indiscriminately exploit two types of examples.

Machine Translation NMT +2

融合情感分析的隐式反问句识别模型(Implicit Rhetorical Questions Recognition Model Combined with Sentiment Analysis)

no code implementations CCL 2021 Xiang Li, Chengwei Liu, Xiaoxu Zhu

“反问是现代汉语中一种常用的修辞手法, 根据是否含有反问标记可分为显式反问句与隐式反问句。其中隐式反问句表达的情感更为丰富, 表现形式也十分复杂, 对隐式反问句的识别更具挑战性。本文首先扩充了汉语反问句语料库, 语料库规模达到10000余句, 接着针对隐式反问句的特点, 提出了一种融合情感分析的隐式反问句识别模型。模型考虑了句子的语义信息, 上下文信息, 并借助情感分析任务辅助识别隐式反问句。实验结果表明, 本文提出的模型在隐式反问句识别任务上取得了良好的性能。”

Sentiment Analysis

JointCL: A Joint Contrastive Learning Framework for Zero-Shot Stance Detection

1 code implementation ACL 2022 Bin Liang, Qinglin Zhu, Xiang Li, Min Yang, Lin Gui, Yulan He, Ruifeng Xu

In this paper, we propose a joint contrastive learning (JointCL) framework, which consists of stance contrastive learning and target-aware prototypical graph contrastive learning.

Contrastive Learning Zero-Shot Stance Detection

Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network

1 code implementation ACL 2022 Bin Liang, Chenwei Lou, Xiang Li, Min Yang, Lin Gui, Yulan He, Wenjie Pei, Ruifeng Xu

Then, the descriptions of the objects are served as a bridge to determine the importance of the association between the objects of image modality and the contextual words of text modality, so as to build a cross-modal graph for each multi-modal instance.

Sarcasm Detection

BIT-Xiaomi’s System for AutoSimTrans 2022

no code implementations NAACL (AutoSimTrans) 2022 Mengge Liu, Xiang Li, Bao Chen, Yanzhi Tian, Tianwei Lan, Silin Li, Yuhang Guo, Jian Luan, Bin Wang

This system paper describes the BIT-Xiaomi simultaneous translation system for Autosimtrans 2022 simultaneous translation challenge.

Chunking Data Augmentation +1

Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective

no code implementations2 Feb 2025 Yujin Oh, Pengfei Jin, Sangjoon Park, Sekeun Kim, Siyeop Yoon, Kyungsang Kim, Jin Sung Kim, Xiang Li, Quanzheng Li

Ensuring fairness in medical image segmentation is critical due to biases in imbalanced clinical data acquisition caused by demographic attributes (e. g., age, sex, race) and clinical factors (e. g., disease severity).

Fairness Image Segmentation +4

EgoMe: Follow Me via Egocentric View in Real World

no code implementations31 Jan 2025 Heqian Qiu, Zhaofeng Shi, Lanxiao Wang, Huiyu Xiong, Xiang Li, Hongliang Li

For a pair of videos, one video captures a exocentric view of the imitator observing the demonstrator's actions, while the other captures a egocentric view of the imitator subsequently following those actions.

Imitation Learning

Value Function Decomposition in Markov Recommendation Process

no code implementations29 Jan 2025 Xiaobei Wang, Shuchang Liu, Qingpeng Cai, Xiang Li, Lantao Hu, Han Li, Guangming Xie

Recent advances in recommender systems have shown that user-system interaction essentially formulates long-term optimization problems, and online reinforcement learning can be adopted to improve recommendation performance.

Recommendation Systems

Distinguished Quantized Guidance for Diffusion-based Sequence Recommendation

no code implementations29 Jan 2025 Wenyu Mao, Shuchang Liu, Haoyang Liu, Haozhe Liu, Xiang Li, Lanatao Hu

To address these issues, we propose Distinguished Quantized Guidance for Diffusion-based Sequence Recommendation (DiQDiff), which aims to extract robust guidance to understand user interests and generate distinguished items for personalized user interests within DMs.

Denoising Quantization +1

Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video

1 code implementation24 Jan 2025 Xiaohao Xu, Tianyi Zhang, Shibo Zhao, Xiang Li, Sibo Wang, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Sebastian Scherer, Xiaonan Huang

We aim to redefine robust ego-motion estimation and photorealistic 3D reconstruction by addressing a critical limitation: the reliance on noise-free data in existing models.

3D Reconstruction Benchmarking +2

NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the Wild

1 code implementation23 Jan 2025 Yongxiang Liu, Weijie Li, Li Liu, Jie zhou, Xuying Xiong, Bowen Peng, Yafei Song, Wei Yang, Tianpeng Liu, Zhen Liu, Xiang Li

This paper introduces NUDT4MSTAR, a large-scale SAR dataset for remote sensing target recognition in the wild, including 40 vehicle target types and various imaging conditions across 5 realistic scenes.

Earth Observation Object Recognition +1

Uncovering Bias in Foundation Models: Impact, Testing, Harm, and Mitigation

1 code implementation14 Jan 2025 Shuzhou Sun, Li Liu, Yongxiang Liu, Zhen Liu, Shuanghui Zhang, Janne Heikkilä, Xiang Li

Bias in Foundation Models (FMs) - trained on vast datasets spanning societal and historical knowledge - poses significant challenges for fairness and equity across fields such as healthcare, education, and finance.

Fairness

3DCoMPaT200: Language-Grounded Compositional Understanding of Parts and Materials of 3D Shapes

1 code implementation12 Jan 2025 Mahmoud Ahmed, Xiang Li, Arpit Prajapati, Mohamed Elhoseiny

To foster richer and fine-grained part-level 3D understanding, we introduce 3DCoMPaT200, a large-scale dataset tailored for compositional understanding of object parts and materials, with 200 object categories with $\approx$5 times larger object vocabulary compared to 3DCoMPaT and $\approx$ 4 times larger part categories.

Navigate Object +1

RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark

1 code implementation8 Jan 2025 Xin Zhang, Xue Yang, YuXuan Li, Jian Yang, Ming-Ming Cheng, Xiang Li

Our approach can effectively improve the performance of existing state-of-the-art weakly supervised methods and even surpasses fully supervised models on existing optical benchmarks (i. e., DOTA-v1. 0 dataset).

object-detection Object Detection +1

Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

3 code implementations7 Jan 2025 Xinbin Yuan, Zhaohui Zheng, YuXuan Li, Xialei Liu, Li Liu, Xiang Li, Qibin Hou, Ming-Ming Cheng

While witnessed with rapid development, remote sensing object detection remains challenging for detecting high aspect ratio objects.

 Ranked #1 on Object Detection In Aerial Images on DOTA (using extra training data)

Object object-detection +2

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection

1 code implementation30 Dec 2024 YuXuan Li, Xiang Li, Yunheng Li, YiCheng Zhang, Yimian Dai, Qibin Hou, Ming-Ming Cheng, Jian Yang

To address these, we establish a benchmark dataset and propose a unified model, SM3Det (Single Model for Multi-Modal datasets and Multi-Task object Detection).

object-detection Object Detection

SCBench: A Sports Commentary Benchmark for Video LLMs

no code implementations23 Dec 2024 Kuangzhi Ge, Lingjun Chen, Kevin Zhang, Yulin Luo, Tianyu Shi, Liaoyuan Fan, Xiang Li, Guanqun Wang, Shanghang Zhang

Inspired by these challenges, we propose a novel task: sports video commentary generation, developed $\textbf{SCBench}$ for Video LLMs.

Benchmarking

Decoupled Functional Central Limit Theorems for Two-Time-Scale Stochastic Approximation

no code implementations22 Dec 2024 Yuze Han, Xiang Li, Jiadong Liang, Zhihua Zhang

In two-time-scale stochastic approximation (SA), two iterates are updated at different rates, governed by distinct step sizes, with each update influencing the other.

Breaking the Context Bottleneck on Long Time Series Forecasting

1 code implementation21 Dec 2024 Chao Ma, Yikai Hou, Xiang Li, Yinggang Sun, Haining Yu, Zhou Fang, Jiaxing Qu

To obtain such long foresight, models must be both efficient and effective in processing long sequence.

Decision Making Time Series +1

Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties

1 code implementation19 Dec 2024 Wenqiao Li, Bozhong Zheng, Xiaohao Xu, Jinye Gan, Fading Lu, Xiang Li, Na Ni, Zheng Tian, Xiaonan Huang, Shenghua Gao, Yingna Wu

Object anomaly detection is essential for industrial quality inspection, yet traditional single-sensor methods face critical limitations.

Anomaly Detection Object +1

PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization

1 code implementation19 Dec 2024 Jiayi Wu, Hengyi Cai, Lingyong Yan, Hao Sun, Xiang Li, Shuaiqiang Wang, Dawei Yin, Ming Gao

The emergence of Retrieval-augmented generation (RAG) has alleviated the issues of outdated and hallucinatory content in the generation of large language models (LLMs), yet it still reveals numerous limitations.

Informativeness RAG +1

Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

no code implementations17 Dec 2024 Yibo Zhao, Jiapeng Zhu, Can Xu, Xiang Li

The rapid growth of social media platforms has raised significant concerns regarding online content toxicity.

SEAGraph: Unveiling the Whole Story of Paper Review Comments

no code implementations16 Dec 2024 Jianxiang Yu, Jiaqi Tan, Zichen Ding, Jiapeng Zhu, Jiahao Li, Yao Cheng, Qier Cui, Yunshi Lan, Xiang Li

Peer review, as a cornerstone of scientific research, ensures the integrity and quality of scholarly work by providing authors with objective feedback for refinement.

Coupling-based Convergence Diagnostic and Stepsize Scheme for Stochastic Gradient Descent

1 code implementation15 Dec 2024 Xiang Li, Qiaomin Xie

The convergence behavior of Stochastic Gradient Descent (SGD) crucially depends on the stepsize configuration.

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

1 code implementation14 Dec 2024 Hao Chen, Ze Wang, Xiang Li, Ximeng Sun, Fangyi Chen, Jiang Liu, Jindong Wang, Bhiksha Raj, Zicheng Liu, Emad Barsoum

With its fully-differentiable design and semantic-rich latent space, our experiment demonstrates that SoftVQ-VAE achieves efficient tokenization without compromising generation quality, paving the way for more efficient generative models.

Denoising Image Generation

Agent-based Video Trimming

no code implementations12 Dec 2024 Lingfeng Yang, Zhenyuan Chen, Xiang Li, Peiyang Jia, Liangqu Long, Jian Yang

As information becomes more accessible, user-generated videos are increasing in length, placing a burden on viewers to sift through vast content for valuable insights.

Highlight Detection Moment Retrieval +2

ATPrompt: Textual Prompt Learning with Embedded Attributes

1 code implementation12 Dec 2024 Zheng Li, Yibing Song, Penghai Zhao, Ming-Ming Cheng, Xiang Li, Jian Yang

Textual-based prompt learning methods primarily employ multiple learnable soft prompts and hard class tokens in a cascading manner as text prompt inputs, aiming to align image and text (category) spaces for downstream tasks.

Attribute Large Language Model

PAFFA: Premeditated Actions For Fast Agents

no code implementations10 Dec 2024 Shambhavi Krishna, Zheng Chen, Vaibhav Kumar, Xiaojiang Huang, Yingjie Li, Fan Yang, Xiang Li

Modern AI assistants have made significant progress in natural language understanding and API/tool integration, with emerging efforts to incorporate diverse interfaces (such as Web interfaces) for enhanced scalability and functionality.

Natural Language Understanding

Personalized and Sequential Text-to-Image Generation

no code implementations10 Dec 2024 Ofir Nabati, Guy Tennenholtz, ChihWei Hsu, MoonKyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier

We address the problem of personalized, interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions.

Language Modeling Language Modelling +2

Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System

no code implementations6 Dec 2024 Fang Zeng, Zhiliang Lyu, Quanzheng Li, Xiang Li

This study introduces "RadCouncil," a multi-agent Large Language Model (LLM) framework designed to enhance the generation of impressions in radiology reports from the finding section.

Language Modeling Language Modelling +2

Community Detection with Heterogeneous Block Covariance Model

no code implementations4 Dec 2024 Xiang Li, Yunpeng Zhao, Qing Pan, Ning Hao

Community detection is the task of clustering objects based on their pairwise relationships.

Community Detection model +1

XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation

1 code implementation2 Dec 2024 Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Jiuxiang Gu, Jindong Wang, Zhe Lin, Bhiksha Raj

Improvements in architecture, quantization techniques, and training recipes have significantly enhanced both image reconstruction and the downstream generation quality.

Image Reconstruction Quantization

Impromptu Cybercrime Euphemism Detection

no code implementations2 Dec 2024 Xiang Li, Yucheng Zhou, Laiping Zhao, Jing Li, Fangming Liu

Moreover, we propose a detection framework tailored to this problem, which employs context augmentation modeling and multi-round iterative training.

A Compact Hybrid Battery Thermal Management System for Enhanced Cooling

no code implementations1 Dec 2024 Zhipeng Lyu, Jinrong Su, Zhe Li, Xiang Li, Hanghang Yan, Lei Chen

Hybrid battery thermal management systems (HBTMS) combining active liquid cooling and passive phase change materials (PCM) cooling have shown a potential for the thermal management of lithium-ion batteries.

Management

Perturbation Ontology based Graph Attention Networks

no code implementations27 Nov 2024 Yichen Wang, Jie Wang, Fulin Wang, Xiang Li, Hao Yin, Bhiksha Raj

In recent years, graph representation learning has undergone a paradigm shift, driven by the emergence and proliferation of graph neural networks (GNNs) and their heterogeneous counterparts.

Graph Attention Graph Representation Learning +3

Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation

no code implementations26 Nov 2024 Xiang Li, Zixuan Huang, Anh Thai, James M. Rehg

Symmetry is a ubiquitous and fundamental property in the visual world, serving as a critical cue for perception and structure interpretation.

3D Generation 3D geometry +1

Robust Detection of Watermarks for Large Language Models Under Human Edits

1 code implementation21 Nov 2024 Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J. Su

We prove that the Tr-GoF test achieves optimality in robust detection of the Gumbel-max watermark in a certain asymptotic regime of substantial text modifications and vanishing watermark signals.

Scalable Deep Metric Learning on Attributed Graphs

no code implementations20 Nov 2024 Xiang Li, Gagan Agrawal, Ruoming Jin, Rajiv Ramnath

We consider the problem of constructing embeddings of large attributed graphs and supporting multiple downstream learning tasks.

Contrastive Learning Graph Embedding +4

UMGAD: Unsupervised Multiplex Graph Anomaly Detection

no code implementations19 Nov 2024 Xiang Li, Jianpeng Qi, Zhongying Zhao, Guanjie Zheng, Lei Cao, Junyu Dong, Yanwei Yu

To address the above challenges, we propose a novel Unsupervised Multiplex Graph Anomaly Detection method, named UMGAD.

Attribute Contrastive Learning +2

Federated Contrastive Learning of Graph-Level Representations

no code implementations18 Nov 2024 Xiang Li, Gagan Agrawal, Rajiv Ramnath, Ruoming Jin

This points to the need for federated learning for graph-level representations, a topic that has not been explored much, especially in an unsupervised setting.

Clustering Contrastive Learning +2

Debiasing Watermarks for Large Language Models via Maximal Coupling

1 code implementation17 Nov 2024 Yangxinyu Xie, Xiang Li, Tanwi Mallick, Weijie J. Su, Ruixun Zhang

Watermarking language models is essential for distinguishing between human and machine-generated text and thus maintaining the integrity and trustworthiness of digital communication.

HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization

no code implementations16 Nov 2024 Huaqin Zhao, Jiaxi Li, Yi Pan, Shizhe Liang, Xiaofeng Yang, Wei Liu, Xiang Li, Fei Dou, Tianming Liu, Jin Lu

Experimental results on RoBERTa-large and OPT-1. 3B across multiple tasks show that HELENE achieves up to a 20x speedup compared to MeZO, with average accuracy improvements of 1. 5%.

parameter-efficient fine-tuning

GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding

1 code implementation16 Nov 2024 Yue Zhou, Mengcheng Lan, Xiang Li, Yiping Ke, Xue Jiang, Litong Feng, Wayne Zhang

Remote sensing (RS) visual grounding aims to use natural language expression to locate specific objects (in the form of the bounding box or segmentation mask) in RS images, enhancing human interaction with intelligent RS interpretation systems.

Language Modeling Language Modelling +3

Is Graph Convolution Always Beneficial For Every Feature?

no code implementations12 Nov 2024 Yilun Zheng, Xiang Li, Sitao Luan, Xiaojiang Peng, Lihui Chen

In prior studies, to assess the impacts of graph convolution on features, people proposed metrics based on feature homophily to measure feature consistency with the graph topology.

feature selection Informativeness

Rethinking Structure Learning For Graph Neural Networks

no code implementations12 Nov 2024 Yilun Zheng, Zhuofan Zhang, ZiMing Wang, Xiang Li, Sitao Luan, Xiaojiang Peng, Lihui Chen

Surprisingly, our empirical observations and theoretical analysis show that no matter which type of graph structure construction methods are used, after feeding the same GSL bases to the newly constructed graph, there is no MI gain compared to the original GSL bases.

Graph structure learning

TransUNext: towards a more advanced U-shaped framework for automatic vessel segmentation in the fundus image

no code implementations5 Nov 2024 Xiang Li, Mingsi Liu, Lixin Duan

Purpose: Automatic and accurate segmentation of fundus vessel images has become an essential prerequisite for computer-aided diagnosis of ophthalmic diseases such as diabetes mellitus.

Retinal Vessel Segmentation

ARN-LSTM: A Multi-Stream Fusion Model for Skeleton-based Action Recognition

no code implementations4 Nov 2024 Chuanchuan Wang, Ahmad Sufril Azlan Mohmamed, Mohd Halim Bin Mohd Noor, Xiao Yang, Feifan Yi, Xiang Li

This paper presents the ARN-LSTM architecture, a novel multi-stream action recognition model designed to address the challenge of simultaneously capturing spatial motion and temporal dynamics in action sequences.

Action Recognition Group Activity Recognition +2

Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

3 code implementations4 Nov 2024 Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, Zhongzhi Chen, Xuemeng Huang, Fengzong Lian, Saiyong Yang, Jianfeng Yan, Yuyuan Zeng, Xiaoqin Ren, Chao Yu, Lulu Wu, Yue Mao, Jun Xia, Tao Yang, Suncong Zheng, Kan Wu, Dian Jiao, Jinbao Xue, Xipeng Zhang, Decheng Wu, Kai Liu, Dengpeng Wu, Guanghui Xu, Shaohua Chen, Shuang Chen, Xiao Feng, Yigeng Hong, Junqiang Zheng, Chengcheng Xu, Zongwei Li, Xiong Kuang, Jianglu Hu, Yiqi Chen, Yuchi Deng, Guiyang Li, Ao Liu, Chenchen Zhang, Shihui Hu, Zilong Zhao, Zifan Wu, Yao Ding, Weichao Wang, Han Liu, Roberts Wang, Hao Fei, Peijie Yu, Ze Zhao, Xun Cao, Hai Wang, Fusheng Xiang, Mengyuan Huang, Zhiyuan Xiong, Bin Hu, Xuebin Hou, Lei Jiang, Jianqiang Ma, Jiajia Wu, Yaping Deng, Yi Shen, Qian Wang, Weijie Liu, Jie Liu, Meng Chen, Liang Dong, Weiwen Jia, Hu Chen, Feifei Liu, Rui Yuan, Huilin Xu, Zhenxiang Yan, Tengfei Cao, Zhichao Hu, Xinhua Feng, Dong Du, TingHao Yu, Yangyu Tao, Feng Zhang, Jianchen Zhu, Chengzhong Xu, Xirui Li, Chong Zha, Wen Ouyang, Yinben Xia, Xiang Li, Zekun He, Rongpeng Chen, Jiawei Song, Ruibin Chen, Fan Jiang, Chongqing Zhao, Bo wang, Hao Gong, Rong Gan, Winston Hu, Zhanhui Kang, Yong Yang, Yuhong Liu, Di Wang, Jie Jiang

In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens.

Logical Reasoning Mathematical Problem-Solving

TableGPT2: A Large Multimodal Model with Tabular Data Integration

1 code implementation4 Nov 2024 Aofeng Su, Aowen Wang, Chao Ye, Chen Zhou, Ga Zhang, Gang Chen, Guangcheng Zhu, Haobo Wang, Haokai Xu, Hao Chen, Haoze Li, Haoxuan Lan, Jiaming Tian, Jing Yuan, Junbo Zhao, Junlin Zhou, Kaizhe Shou, Liangyu Zha, Lin Long, Liyao Li, Pengzuo Wu, Qi Zhang, Qingyi Huang, Saisai Yang, Tao Zhang, Wentao Ye, Wufang Zhu, Xiaomeng Hu, Xijun Gu, Xinjie Sun, Xiang Li, Yuhang Yang, Zhiqing Xiao

In response, we introduce TableGPT2, a model rigorously pre-trained and fine-tuned with over 593. 8K tables and 2. 36M high-quality query-table-output tuples, a scale of table-related data unprecedented in prior research.

Benchmarking Data Integration

Multi-Channel Hypergraph Contrastive Learning for Matrix Completion

no code implementations2 Nov 2024 Xiang Li, Changsheng Shui, Yanwei Yu, Chao Huang, Zhongying Zhao, Junyu Dong

The (rating) matrix completion is essentially a rating prediction process, which is also a significant problem in recommender systems.

Contrastive Learning Hypergraph Contrastive Learning +2

MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization

1 code implementation1 Nov 2024 Jingming Guo, Yan Liu, Yu Meng, Zhiwei Tao, Banglan Liu, Gang Chen, Xiang Li

The Mixture of Experts (MoE) is an advanced model architecture in the industry that combines multiple specialized expert models from various domains into a single supermodel.

8k

Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure

1 code implementation31 Oct 2024 Xiang Li, Yixiang Dai, Qing Qu

This discovery leads us to investigate the linear counterparts of the nonlinear diffusion models, which are a series of linear models trained to match the function mappings of the nonlinear diffusion denoisers.

Inductive Bias Memorization

EchoFM: Foundation Model for Generalizable Echocardiogram Analysis

1 code implementation30 Oct 2024 Sekeun Kim, Pengfei Jin, Sifan Song, Cheng Chen, Yiwei Li, Hui Ren, Xiang Li, Tianming Liu, Quanzheng Li

In this paper, we introduce EchoFM, a foundation model specifically designed to represent and analyze echocardiography videos.

Contrastive Learning model +1

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

no code implementations29 Oct 2024 Kangyang Luo, Zichen Ding, Zhenmin Weng, Lingfeng Qiao, Meng Zhao, Xiang Li, Di Yin, Jinlong Shu

While Chain of Thought (CoT) prompting approaches have significantly consolidated the reasoning capabilities of large language models (LLMs), they still face limitations that require extensive human effort or have performance needs to be improved.

Novel Object Synthesis via Adaptive Text-Image Harmony

no code implementations28 Oct 2024 Zeren Xiong, Zedong Zhang, Zikun Chen, Shuo Chen, Xiang Li, Gan Sun, Jian Yang, Jun Li

In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image.

Object

Can Large Language Models Act as Ensembler for Multi-GNNs?

no code implementations22 Oct 2024 Hanqi Duan, Yao Cheng, Jianxiang Yu, Xiang Li

This allows LensGNN to ensemble multiple GNNs and take advantage of the strengths of LLM, leading to a deeper understanding of both textual semantic information and graph structural information.

Ensemble Learning

On the Diversity of Synthetic Data and its Impact on Training Large Language Models

no code implementations19 Oct 2024 Hao Chen, Abdul Waheed, Xiang Li, Yidong Wang, Jindong Wang, Bhiksha Raj, Marah I. Abdin

The rise of Large Language Models (LLMs) has accentuated the need for diverse, high-quality pre-training data.

Diversity

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

1 code implementation19 Oct 2024 Kun Wang, Zhiqiang Yan, Junkai Fan, Wanlu Zhu, Xiang Li, Jun Li, Jian Yang

In this paper, we introduce DCDepth, a novel framework for the long-standing monocular depth estimation task.

Monocular Depth Estimation

Hierarchical Conditional Multi-Task Learning for Streamflow Modeling

no code implementations18 Oct 2024 Shaoming Xu, Arvind Renganathan, Ankush Khandelwal, Rahul Ghosh, Xiang Li, Licheng Liu, Kshitij Tayal, Peter Harrington, Xiaowei Jia, Zhenong Jin, Jonh Nieber, Vipin Kumar

To address this, we propose Hierarchical Conditional Multi-Task Learning (HCMTL), a hierarchical approach that jointly models soil water and snowpack processes based on their causal connections to streamflow.

Management Multi-Task Learning

Boosting Imperceptibility of Stable Diffusion-based Adversarial Examples Generation with Momentum

1 code implementation17 Oct 2024 Nashrah Haque, Xiang Li, Zhehui Chen, Yanzhao Wu, Lei Yu, Arun Iyengar, Wenqi Wei

We propose a novel framework, Stable Diffusion-based Momentum Integrated Adversarial Examples (SD-MIAE), for generating adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and preserving the semantic similarity to the original class label.

Semantic Similarity Semantic Textual Similarity +1

Retrieval Instead of Fine-tuning: A Retrieval-based Parameter Ensemble for Zero-shot Learning

no code implementations13 Oct 2024 Pengfei Jin, Peng Shu, Sekeun Kim, Qing Xiao, Sifan Song, Cheng Chen, Tianming Liu, Xiang Li, Quanzheng Li

Foundation models have become a cornerstone in deep learning, with techniques like Low-Rank Adaptation (LoRA) offering efficient fine-tuning of large models.

Computational Efficiency Image Segmentation +5

S$^4$ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack

no code implementations13 Oct 2024 Yongxiang Liu, Bowen Peng, Li Liu, Xiang Li

Transferable targeted adversarial attacks (TTAs) against deep neural networks have been proven significantly more challenging than untargeted ones, yet they remain relatively underexplored.

EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis

no code implementations12 Oct 2024 Yi Pan, Hanqi Jiang, JunHao Chen, Yiwei Li, Huaqin Zhao, Yifan Zhou, Peng Shu, Zihao Wu, Zhengliang Liu, Dajiang Zhu, Xiang Li, Yohannes Abate, Tianming Liu

Neuromorphic computing has emerged as a promising energy-efficient alternative to traditional artificial intelligence, predominantly utilizing spiking neural networks (SNNs) implemented on neuromorphic hardware.

Image Classification Medical Image Analysis +1

Octopus Inspired Optimization Algorithm: Multi-Level Structures and Parallel Computing Strategies

1 code implementation10 Oct 2024 Xu Wang, Longji Xu, Yiquan Wang, Yuhua Dong, Xiang Li, Jia Deng, Rui He

This paper introduces a novel bionic intelligent optimisation algorithm, Octopus Inspired Optimization (OIO) algorithm, which is inspired by the neural structure of octopus, especially its hierarchical and decentralised interaction properties.

Computational Efficiency Management

SONAR: A Synthetic AI-Audio Detection Framework and Benchmark

1 code implementation6 Oct 2024 Xiang Li, Pin-Yu Chen, Wenqi Wei

In this paper, we introduce SONAR, a synthetic AI-Audio Detection Framework and Benchmark, aiming to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.

Audio Synthesis DeepFake Detection +4

ECHOPulse: ECG controlled echocardio-grams video generation

1 code implementation4 Oct 2024 Yiwei Li, Sekeun Kim, Zihao Wu, Hanqi Jiang, Yi Pan, Pengfei Jin, Sifan Song, Yucheng Shi, Tianming Liu, Quanzheng Li, Xiang Li

Echocardiography (ECHO) is essential for cardiac assessments, but its video quality and interpretation heavily relies on manual expertise, leading to inconsistent results from clinical and portable devices.

Video Generation

Volumetric Conditional Score-based Residual Diffusion Model for PET/MR Denoising

1 code implementation30 Sep 2024 Siyeop Yoon, Rui Hu, Yuang Wang, Matthew Tivnan, Young-Don Son, Dufan Wu, Xiang Li, Kyungsang Kim, Quanzheng Li

PET imaging is a powerful modality offering quantitative assessments of molecular and physiological processes.

Denoising

HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes

1 code implementation30 Sep 2024 Changfeng Feng, Zhenyuan Chen, Renke Kou, Guangwei Gao, Chunping Wang, Xiang Li, Xiangbo Shu, Yimian Dai, Qiang Fu, Jian Yang

By observing the significant variations in object scale and clarity under different depth and haze conditions, we designed a Depth Conditioned Detector (DeCoDet) to incorporate this prior knowledge.

Object object-detection +1

GrokLST: Towards High-Resolution Benchmark and Toolkit for Land Surface Temperature Downscaling

1 code implementation30 Sep 2024 Qun Dai, Chunyang Yuan, Yimian Dai, YuXuan Li, Xiang Li, Kang Ni, Jianhui Xu, Xiangbo Shu, Jian Yang

Land Surface Temperature (LST) is a critical parameter for environmental studies, but obtaining high-resolution LST data remains challenging due to the spatio-temporal trade-off in satellite remote sensing.

AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow

no code implementations27 Sep 2024 Huizi Yu, Jiayan Zhou, Lingyao Li, Shan Chen, Jack Gallifant, Anye Shi, Xiang Li, Wenyue Hua, Mingyu Jin, Guang Chen, Yang Zhou, Zhao Li, Trisha Gupte, Ming-Li Chen, Zahra Azizi, Yongfeng Zhang, Themistocles L. Assimes, Xin Ma, Danielle S. Bitterman, Lin Lu, Lizhou Fan

Here, we developed AIPatient, an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and the Reasoning Retrieval-Augmented Generation (Reasoning RAG) agentic workflow as the generation backbone.

Question Answering RAG +1

High-Fidelity 3D Lung CT Synthesis in ARDS Swine Models Using Score-Based 3D Residual Diffusion Models

no code implementations26 Sep 2024 Siyeop Yoon, Yujin Oh, Xiang Li, Yi Xin, Maurizio Cereda, Quanzheng Li

Acute respiratory distress syndrome (ARDS) is a severe condition characterized by lung inflammation and respiratory failure, with a high mortality rate of approximately 40%.

Computed Tomography (CT) Management +1

Cascade Prompt Learning for Vision-Language Model Adaptation

2 code implementations26 Sep 2024 Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li

Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks.

General Knowledge Image Classification +3

Small Language Models: Survey, Measurements, and Insights

1 code implementation24 Sep 2024 Zhenyan Lu, Xiang Li, Dongqi Cai, Rongjie Yi, Fangming Liu, Xiwen Zhang, Nicholas D. Lane, Mengwei Xu

Small language models (SLMs), despite their widespread adoption in modern smart devices, have received significantly less academic attention compared to their large language model (LLM) counterparts, which are predominantly deployed in data centers and cloud environments.

Benchmarking Decoder +5

$\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models

1 code implementation20 Sep 2024 Huanxuan Liao, Shizhu He, Yupu Hao, Xiang Li, Yuanzhe Zhang, Jun Zhao, Kang Liu

By efficiently internalizing knowledge, $\textit{SKIntern}$ reduces computational overhead and speeds up the reasoning process by focusing solely on the question during inference.

A Unified Framework to Classify Business Activities into International Standard Industrial Classification through Large Language Models for Circular Economy

no code implementations17 Sep 2024 Xiang Li, Lan Zhao, Junhao Ren, Yajuan Sun, Chuan Fu Tan, Zhiquan Yeo, Gaoxi Xiao

This approach enables any economic activity descriptions provided by businesses worldwide to be categorized into the unified ISIC standard, facilitating the creation of a centralized knowledge repository.

Recommendation Systems

GP-GPT: Large Language Model for Gene-Phenotype Mapping

no code implementations15 Sep 2024 Yanjun Lyu, Zihao Wu, Lu Zhang, Jing Zhang, Yiwei Li, Wei Ruan, Zhengliang Liu, Xiaowei Yu, Chao Cao, Tong Chen, Minheng Chen, Yan Zhuang, Xiang Li, Rongjie Liu, Chao Huang, Wentao Li, Tianming Liu, Dajiang Zhu

To address these challenges, we present GP-GPT, the first specialized large language model for genetic-phenotype knowledge representation and genomics relation analysis.

Information Retrieval Language Modeling +3