Search Results for author: Wei Li

Found 473 papers, 188 papers with code

《二十四史》古代汉语语义依存图库构建(Construction of Semantic Dependency Graph Bank of Ancient Chinese in twenty four histories)

no code implementations CCL 2022 Tian Huang, Yanqiu Shao, Wei Li

“语义依存图是NLP处理语义的深层分析方法, 能够对句子中词与词之间的语义进行分析。该文针对古代汉语特点, 在制定古代汉语语义依存图标注规范的基础上, 以《二十四史》为语料来源, 完成标注了规模为3000句的古代汉语语义依存图库, 标注一致性的kappa值为78. 83%。通过与现代汉语语义依存图库的对比, 对依存图库基本情况进行统计, 分析古代汉语的语义特色和规律。统计显示, 古代汉语语义分布宏观上符合齐普夫定律, 在语义事件描述上具有强烈的历史性叙事和正式文体特征, 如以人物纪传为中心, 时间、地点等周边角色描述细致, 叙事语言冷静客观, 缺少描述情态、语气、程度、时间状态等的修饰词语等。 "

Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition

no code implementations ECCV 2020 Niamul Quader, Juwei Lu, Peng Dai, Wei Li

State-of-the-art approaches to video-based action and gesture recognition often employ two key concepts: First, they employ multistream processing; second, they use an ensemble of convolutional networks.

3D Action Recognition Action Classification +3

SgSum:Transforming Multi-document Summarization into Sub-graph Selection

1 code implementation EMNLP 2021 Moye Chen, Wei Li, Jiachen Liu, Xinyan Xiao, Hua Wu, Haifeng Wang

Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent.

Document Summarization Multi-Document Summarization +1

Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks

1 code implementation ECCV 2020 Niamul Quader, Md Mafijul Islam Bhuiyan, Juwei Lu, Peng Dai, Wei Li

We propose novel approaches for simultaneously identifying important weights of a convolutional neural network (ConvNet) and providing more attention to the important weights during training.

3D Action Recognition 3D Object Classification +7

Meta-CQG: A Meta-Learning Framework for Complex Question Generation over Knowledge Bases

no code implementations COLING 2022 Kun Zhang, Yunqi Qiu, Yuanzhuo Wang, Long Bai, Wei Li, Xuhui Jiang, HuaWei Shen, Xueqi Cheng

Complex question generation over knowledge bases (KB) aims to generate natural language questions involving multiple KB relations or functional constraints.

Contrastive Learning Decoder +3

基于统一模型的藏文新闻摘要(Abstractive Summarization of Tibetan News Based on Hybrid Model)

no code implementations CCL 2020 Xiaodong Yan, Xiaoqing Xie, Yu Zou, Wei Li

Seq2seq神经网络模型在中英文文本摘要的研究中取得了良好的效果, 但在低资源语言的文本摘要研究还处于探索阶段, 尤其是在藏语中。此外, 目前还没有大规模的标注语料库进行摘要提取。本文提出了一种生成藏文新闻摘要的统一模型。利用TextRank算法解决了藏语标注训练数据不足的问题。然后, 采用两层双GRU神经网络提取代表原始新闻的句子, 减少冗余信息。最后, 使用基于注意力机制的Seq2Seq来生成理解式摘要。同时, 我们加入了指针网络来处理未登录词的问题。实验结果表明, ROUGE-1评分比传统模型提高了2%。 关键词:文本摘要;藏文;TextRank; 指针网络;Bi-GRU

Abstractive Text Summarization

基于强化学习的古今汉语句子对齐研究(Research on Sentence Alignment of Ancient and Modern Chinese based on Reinforcement Learning)

no code implementations CCL 2022 Kuai Yu, Yanqiu Shao, Wei Li

“基于深度学习的有监督机器翻译取得了良好的效果, 但训练过程中需要大量质量较高的对齐语料。对于中文古今翻译场景, 高质量的平行语料并不多, 而粗对齐的篇章、段语料比较容易获得, 因此语料对齐很有研究价值和研究必要。在传统双语平行语料的句子对齐研究中, 传统方法根据双语文本中的长度、词汇、共现文字等语法信息, 建立一个综合评判标准来衡量两个句对之间相似度。此类方法虽然在单句对齐上取得了较好的效果, 但是对于句子语义匹配的能力有限, 并且在一些多对多的对齐模式上的性能表现不佳。在本文中我们提出尝试利用现在发展迅速且具有强大语义表示能力的预训练语言模型来考虑双语的语义信息, 但是单独使用预训练语言模型只能考虑相对局部的信息, 因此我们提出采用基于动态规划算法的强化学习训练目标来整合段落全局信息, 并且进行无监督训练。实验结果证明我们提出的方法训练得到的模型性能优于此前获得最好表现的基线模型, 尤其相较于传统模型难以处理的多对多对齐模式下, 性能提升较大。”

Sentence

Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model

1 code implementation12 May 2025 Wei Li, Ming Hu, Guoan Wang, Lihao Liu, Kaijin Zhou, Junzhi Ning, Xin Guo, ZongYuan Ge, Lixu Gu, Junjun He

In ophthalmic surgery, developing an AI system capable of interpreting surgical videos and predicting subsequent operations requires numerous ophthalmic surgical videos with high-quality annotations, which are difficult to collect due to privacy concerns and labor consumption.

Depth-Sensitive Soft Suppression with RGB-D Inter-Modal Stylization Flow for Domain Generalization Semantic Segmentation

no code implementations11 May 2025 Binbin Wei, Yuhang Zhang, Shishun Tian, Muxin Liao, Wei Li, Wenbin Zou

Hence, we propose a novel framework, namely Depth-Sensitive Soft Suppression with RGB-D inter-modal stylization flow (DSSS), focusing on learning domain-invariant features from depth maps for the DG semantic segmentation.

DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution

no code implementations21 Apr 2025 Miaomiao Cai, Simiao Li, Wei Li, Xudong Huang, Hanting Chen, Jie Hu, Yunhe Wang

Recent advances in diffusion models have improved Real-World Image Super-Resolution (Real-ISR), but existing methods lack human feedback integration, risking misalignment with human preference and may leading to artifacts, hallucinations and harmful content generation.

Image Super-Resolution

Efficient Spiking Point Mamba for Point Cloud Analysis

no code implementations19 Apr 2025 Peixi Wu, Bosong Chai, Menghua Zheng, Wei Li, Zhangchi Hu, Jie Chen, Zheyu Zhang, Hebei Li, Xiaoyan Sun

Due to the poor performance of simply transferring Mamba to 3D SNNs, SPM is designed to utilize both the sequence modeling capabilities of Mamba and the temporal feature extraction of SNNs.

Computational Efficiency Mamba

Spiking Neural Network for Intra-cortical Brain Signal Decoding

1 code implementation12 Apr 2025 Song Yang, Haotian Fu, Herui Zhang, Peng Zhang, Wei Li, Dongrui Wu

Decoding brain signals accurately and efficiently is crucial for intra-cortical brain-computer interfaces.

BrainPrompt: Multi-Level Brain Prompt Enhancement for Neurological Condition Identification

no code implementations12 Apr 2025 Jiaxing Xu, Kai He, Yue Tang, Wei Li, Mengcheng Lan, Xia Dong, Yiping Ke, Mengling Feng

In this paper, we present BrainPrompt, an innovative framework that enhances Graph Neural Networks (GNNs) by integrating Large Language Models (LLMs) with knowledge-driven prompts, enabling more effective capture of complex, non-imaging information and external knowledge for neurological disease identification.

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

no code implementations30 Mar 2025 Kai Liu, Wei Li, Lai Chen, Shengqiong Wu, Yanhao Zheng, Jiayi Ji, Fan Zhou, Rongxin Jiang, Jiebo Luo, Hao Fei, Tat-Seng Chua

This paper introduces JavisDiT, a novel Joint Audio-Video Diffusion Transformer designed for synchronized audio-video generation (JAVG).

Video Generation

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

1 code implementation27 Mar 2025 Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Zhonghua Wu, Qingyi Tao, Wentao Liu, Wei Li, Chen Change Loy

Unifying visual understanding and generation within a single multimodal framework remains a significant challenge, as the two inherently heterogeneous tasks require representations at different levels of granularity.

Image Generation Quantization

ACVUBench: Audio-Centric Video Understanding Benchmark

1 code implementation25 Mar 2025 Yudong Yang, Jimin Zhuang, Guangzhi Sun, Changli Tang, Yixuan Li, Peihan Li, Yifan Jiang, Wei Li, Zejun Ma, Chao Zhang

Audio often serves as an auxiliary modality in video understanding tasks of audio-visual large language models (LLMs), merely assisting in the comprehension of visual information.

Video Understanding

ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

1 code implementation25 Mar 2025 Yang Ren, Hai Jiang, Menglong Yang, Wei Li, Shuaicheng Liu

RAW-to-sRGB mapping, or the simulation of the traditional camera image signal processor (ISP), aims to generate DSLR-quality sRGB images from raw data captured by smartphone sensors.

CCMusic: An Open and Diverse Database for Chinese Music Information Retrieval Research

no code implementations24 Mar 2025 Monan Zhou, Shenyang Xu, Zhaorui Liu, Zhaowen Wang, Feng Yu, Wei Li, Baoqiang Han

Data are crucial in various computer-related fields, including music information retrieval (MIR), an interdisciplinary area bridging computer science and music.

Information Retrieval Music Information Retrieval +1

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning

no code implementations14 Mar 2025 Zixu Cheng, Jian Hu, Ziquan Liu, Chenyang Si, Wei Li, Shaogang Gong

Human processes video reasoning in a sequential spatio-temporal reasoning logic, we first identify the relevant frames ("when") and then analyse the spatial relationships ("where") between key objects, and finally leverage these relationships to draw inferences ("what").

Benchmarking Relational Reasoning +1

Spatial Distillation based Distribution Alignment (SDDA) for Cross-Headset EEG Classification

1 code implementation7 Mar 2025 Dingkun Liu, Siyang Li, Ziwei Wang, Wei Li, Dongrui Wu

A non-invasive brain-computer interface (BCI) enables direct interaction between the user and external devices, typically via electroencephalogram (EEG) signals.

Brain Computer Interface EEG +3

MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

no code implementations7 Mar 2025 Hongwei Yi, Tian Ye, Shitong Shao, Xuancheng Yang, Jiantong Zhao, Hanzhong Guo, Terrance Wang, Qingyu Yin, Zeke Xie, Lei Zhu, Wei Li, Michael Lingelbach, Daquan Zhou

We present MagicInfinite, a novel diffusion Transformer (DiT) framework that overcomes traditional portrait animation limitations, delivering high-fidelity results across diverse character types-realistic humans, full-body figures, and stylized anime characters.

Denoising Portrait Animation +1

Feature Point Extraction for Extra-Affine Image

no code implementations5 Mar 2025 Tao Wang, Yinghui Wang, Yanxing Liang, Liangyi Huang, Jinlong Yang, Wei Li, Xiaojuan Ning

The issue concerning the significant decline in the stability of feature extraction for images subjected to large-angle affine transformations, where the angle exceeds 50 degrees, still awaits a satisfactory solution.

Identifying Ising and percolation phase transitions based on KAN method

no code implementations5 Mar 2025 Dian Xu, Shanshan Wang, Wei Li, Weibing Deng, Feng Gao, Jianmin Shen

The results demonstrate that the KAN can indeed predict the critical points of percolation models.

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

no code implementations5 Mar 2025 Wei Li, Bing Hu, Rui Shao, Leyang Shen, Liqiang Nie

However, existing online video assistants often sacrifice assistant efficacy for real-time efficiency by processing low-frame-rate videos with coarse-grained visual features. To overcome the trade-off between efficacy and efficiency, we propose "Fast & Slow Video-Language Thinker" as an onLIne videO assistaNt, LION-FS, achieving real-time, proactive, temporally accurate, and contextually precise responses.

Response Generation

DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting

no code implementations2 Mar 2025 Liao Shen, Tianqi Liu, Huiqiang Sun, Jiaqi Li, Zhiguo Cao, Wei Li, Chen Change Loy

We also introduce a synthetic dataset to assess refocusing capabilities and the model's ability to learn precise lens parameters.

MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing

1 code implementation28 Feb 2025 Xueyun Tian, Wei Li, Bingbing Xu, Yige Yuan, Yuanzhuo Wang, HuaWei Shen

Experiments show that MIGE excels in both subject-driven generation and instruction-based editing while setting a state-of-the-art in the new task of instruction-based subject-driven editing.

Image Generation Transfer Learning

Delta Decompression for MoE-based LLMs Compression

1 code implementation24 Feb 2025 Hao Gu, Wei Li, Lujun Li, Qiyuan Zhu, Mark Lee, Shengjie Sun, Wei Xue, Yike Guo

Based on observations of expert diversity, we decompose their weights into a shared base weight and unique delta weights.

Diversity Mixture-of-Experts

Autoregressive Image Generation Guided by Chains of Thought

no code implementations24 Feb 2025 Miaomiao Cai, Guanjie Wang, Wei Li, Zhijun Tu, Hanting Chen, Shaohui Lin, Jie Hu

In the field of autoregressive (AR) image generation, models based on the 'next-token prediction' paradigm of LLMs have shown comparable performance to diffusion models by reducing inductive biases.

Image Generation Logical Reasoning

MVCNet: Multi-View Contrastive Network for Motor Imagery Classification

1 code implementation18 Feb 2025 Ziwei Wang, Siyang Li, Xiaoqing Chen, Wei Li, Dongrui Wu

Two contrastive modules are further introduced: a cross-view contrastive module that enforces consistency of original and augmented views, and a cross-model contrastive module that aligns features extracted from both branches.

Contrastive Learning Data Augmentation +4

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

no code implementations17 Feb 2025 Guangzhi Sun, Yudong Yang, Jimin Zhuang, Changli Tang, Yixuan Li, Wei Li, Zejun Ma, Chao Zhang

video-SALMONN-o1 achieves 3-8% accuracy improvements over the LLaVA-OneVision baseline across different video reasoning benchmarks.

Language Modeling Language Modelling +2

CoPEFT: Fast Adaptation Framework for Multi-Agent Collaborative Perception with Parameter-Efficient Fine-Tuning

1 code implementation15 Feb 2025 Quanmin Wei, Penglin Dai, Wei Li, Bingyi Liu, Xiao Wu

However, training a robust collaborative perception model requires collecting sufficient training data that covers all possible collaboration scenarios, which is impractical due to intolerable deployment costs.

Domain Adaptation parameter-efficient fine-tuning

Explanation based In-Context Demonstrations Retrieval for Multilingual Grammatical Error Correction

1 code implementation12 Feb 2025 Wei Li, Wen Luo, Guangyue Peng, Houfeng Wang

In this paper, we propose a novel retrieval method based on natural language grammatical error explanations (GEE) to address this issue.

Grammatical Error Correction In-Context Learning +2

CoS: Chain-of-Shot Prompting for Long Video Understanding

no code implementations10 Feb 2025 Jian Hu, Zixu Cheng, Chenyang Si, Wei Li, Shaogang Gong

Multi-modal Large Language Models (MLLMs) struggle with long videos due to the need for excessive visual tokens.

Video Understanding

WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages

2 code implementations24 Jan 2025 JIA YU, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, Zhenjiang Jin, Jiantao Qiu, Shasha Wang, Zhongying Tu, Dahua Lin, Yu Wang, Yu Qiao, Yanfeng Wang, Conghui He

This paper introduces the open-source dataset WanJuanSiLu, designed to provide high-quality training corpora for low-resource languages, thereby advancing the research and development of multilingual models.

Diversity

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

1 code implementation22 Jan 2025 Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, DaCheng Tao

Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge.

Mathematical Reasoning

DarkFarseer: Inductive Spatio-temporal Kriging via Hidden Style Enhancement and Sparsity-Noise Mitigation

no code implementations6 Jan 2025 Zhuoxuan Liang, Wei Li, Dalin Zhang, Yidan Chen, Zhihong Wang, Xiangping Zheng, Moustafa Youssef

Based on graph neural networks (GNNs) extracting the relationships between physical and virtual sensors, ISK can infer the measurements of virtual sensors from physical sensors.

Contrastive Learning Denoising +2

Multi-Aggregator Time-Warping Heterogeneous Graph Neural Network for Personalized Micro-Video Recommendation

no code implementations5 Jan 2025 Jinkun Han, Wei Li, Zhipeng Cai, Yingshu Li

Micro-video recommendation is attracting global attention and becoming a popular daily service for people of all ages.

Graph Neural Network

Quantum Cognition-Inspired EEG-based Recommendation via Graph Neural Networks

no code implementations5 Jan 2025 Jinkun Han, Wei Li, Yingshu Li, Zhipeng Cai

Current recommendation systems recommend goods by considering users' historical behaviors, social relations, ratings, and other multi-modals.

EEG Recommendation Systems

ZeroFlow: Overcoming Catastrophic Forgetting is Easier than You Think

no code implementations2 Jan 2025 Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang

This work provides essential insights and tools for advancing forward pass methods to overcome forgetting.

Continual Learning

Exploring Structured Semantic Priors Underlying Diffusion Score for Test-time Adaptation

1 code implementation1 Jan 2025 Mingjia Li, Shuang Li, Tongrui Su, Longhui Yuan, Jian Liang, Wei Li

Capitalizing on the complementary advantages of generative and discriminative models has always been a compelling vision in machine learning, backed by a growing body of research.

Denoising Test-time Adaptation

AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction

no code implementations24 Dec 2024 Pufan Zou, Shijia Zhao, Weijie Huang, Qiming Xia, Chenglu Wen, Wei Li, Cheng Wang

Our proposed AdaCo can effectively mitigate the performance limitations of label-free learning networks in 3D semantic segmentation tasks.

3D Semantic Segmentation

Generating Unseen Nonlinear Evolution in Sea Surface Temperature Using a Deep Learning-Based Latent Space Data Assimilation Framework

no code implementations18 Dec 2024 Qingyu Zheng, Guijun Han, Wei Li, Lige Cao, Gongfu Zhou, Haowen Wu, Qi Shao, Ru Wang, Xiaobo Wu, Xudong Cui, Hong Li, Xuan Wang

To fuse multi-source data and reconstruct the nonlinear evolution missing from observations, geoscientists are developing future-oriented DA methods.

NPC: Neural Predictive Control for Fuel-Efficient Autonomous Trucks

no code implementations18 Dec 2024 Jiaping Ren, Jiahao Xiang, Hongfei Gao, Jinchuan Zhang, Yiming Ren, Yuexin Ma, Yi Wu, Ruigang Yang, Wei Li

Fuel efficiency is a crucial aspect of long-distance cargo transportation by oil-powered trucks that economize on costs and decrease carbon emissions.

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

1 code implementation12 Dec 2024 Yunshuai Zhou, Junbo Qiao, Jincheng Liao, Wei Li, Simiao Li, Jiao Xie, Yunhang Shen, Jie Hu, Shaohui Lin

However, previous KD methods for image restoration overlook the state of the student during the distillation, adopting a fixed solution space that limits the capability of KD.

Contrastive Learning Image Restoration +1

GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression

1 code implementation12 Dec 2024 Ziqi Zhou, Weize Quan, Hailin Shi, Wei Li, Lili Wang, Dong-Ming Yan

Audio-driven talking head generation necessitates seamless integration of audio and visual data amidst the challenges posed by diverse input portraits and intricate correlations between audio and facial motions.

Disentanglement Portrait Animation +1

FD-LLM: Large Language Model for Fault Diagnosis of Machines

no code implementations2 Dec 2024 Hamzah A. A. M. Qaid, Bo Zhang, Dan Li, See-Kiong Ng, Wei Li

We assess the fault diagnosis capabilities of four open-sourced LLMs based on the FD-LLM framework, and evaluate the models' adaptability and generalizability under various operational conditions and machine components, namely for traditional fault diagnosis, cross-operational conditions, and cross-machine component settings.

Fault Detection Fault Diagnosis +4

HiFiVFS: High Fidelity Video Face Swapping

no code implementations27 Nov 2024 Xu Chen, Keke He, Junwei Zhu, Yanhao Ge, Wei Li, Chengjie Wang

Face swapping aims to generate results that combine the identity from the source with attributes from the target.

Attribute Face Swapping

OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining

no code implementations23 Nov 2024 Ming Hu, Kun Yuan, Yaling Shen, Feilong Tang, Xiaohao Xu, Lin Zhou, Wei Li, Ying Chen, Zhongxing Xu, Zelin Peng, Siyuan Yan, Vinkle Srivastav, Diping Song, Tianbin Li, Danli Shi, Jin Ye, Nicolas Padoy, Nassir Navab, Junjun He, ZongYuan Ge

Surgical practice involves complex visual interpretation, procedural skills, and advanced medical knowledge, making surgical vision-language pretraining (VLP) particularly challenging due to this complexity and the limited availability of annotated data.

Representation Learning Retrieval

A Survey on LLM-as-a-Judge

2 code implementations23 Nov 2024 Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, Jian Guo

Accurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale.

Models Alignment Survey

Fact-Level Confidence Calibration and Self-Correction

1 code implementation20 Nov 2024 Yige Yuan, Bingbing Xu, Hexiang Tan, Fei Sun, Teng Xiao, Wei Li, HuaWei Shen, Xueqi Cheng

Confidence calibration in LLMs, i. e., aligning their self-assessed confidence with the actual accuracy of their responses, enabling them to self-evaluate the correctness of their outputs.

UNSCT-HRNet: Modeling Anatomical Uncertainty for Landmark Detection in Total Hip Arthroplasty

no code implementations13 Nov 2024 Jiaxin Wan, Lin Liu, Haoran Wang, Liangwei Li, Wei Li, Shuheng Kou, Runtian Li, Jiayi Tang, Juanxiu Liu, Jing Zhang, Xiaohui Du, Ruqian Hao

Total hip arthroplasty (THA) relies on accurate landmark detection from radiographic images, but unstructured data caused by irregular patient postures or occluded anatomical markers pose significant challenges for existing methods.

A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

5 code implementations13 Nov 2024 Yingqi Gao, Yifu Liu, Xiaoxia Li, Xiaorong Shi, Yin Zhu, Yiming Wang, Shiqi Li, Wei Li, Yuntao Hong, Zhiling Luo, Jinyang Gao, Liyu Mou, Yu Li

On the other hand, we implement the ICL approach with an example selection method based on named entity recognition to prevent overemphasis on entities.

Diversity In-Context Learning +3

Sub-DM:Subspace Diffusion Model with Orthogonal Decomposition for MRI Reconstruction

1 code implementation6 Nov 2024 Yu Guan, Qinrong Cai, Wei Li, Qiuyun Fan, Dong Liang, Qiegen Liu

To tackle these challenges, we introduce subspace diffusion model with orthogonal decomposition, a method (referred to as Sub-DM) that restrict the diffusion process via projections onto subspace as the k-space data distribution evolves toward noise.

MRI Reconstruction

Technical Report for Soccernet 2023 -- Dense Video Captioning

no code implementations31 Oct 2024 Zheng Ruan, Ruixuan Liu, Shimin Chen, Mengying Zhou, Xinquan Yang, Wei Li, Chen Chen, Wei Shen

In the task of dense video captioning of Soccernet dataset, we propose to generate a video caption of each soccer action and locate the timestamp of the caption.

Dense Video Captioning

PGDiffSeg: Prior-Guided Denoising Diffusion Model with Parameter-Shared Attention for Breast Cancer Segmentation

no code implementations23 Oct 2024 Feiyan Feng, Tianyu Liu, Hong Wang, Jun Zhao, Wei Li, Yanshen Sun

Therefore, this paper proposes a novel PGDiffSeg (Prior-Guided Diffusion Denoising Model with Parameter-Shared Attention) that applies diffusion denoising methods to breast cancer medical image segmentation, accurately recovering the affected areas from Gaussian noise.

Denoising Image Segmentation +3

Hi-Mamba: Hierarchical Mamba for Efficient Image Super-Resolution

no code implementations14 Oct 2024 Junbo Qiao, Jincheng Liao, Wei Li, Yulun Zhang, Yong Guo, Yi Wen, Zhangxizi Qiu, Jiao Xie, Jie Hu, Shaohui Lin

State Space Models (SSM), such as Mamba, have shown strong representation ability in modeling long-range dependency with linear complexity, achieving successful applications from high-level to low-level vision tasks.

Image Super-Resolution Mamba +1

Can a large language model be a gaslighter?

1 code implementation11 Oct 2024 Wei Li, Luyao Zhu, Yang song, Ruixi Lin, Rui Mao, Yang You

In contrast, we advanced three safety alignment strategies to strengthen (by 12. 05%) the safety guardrail of LLMs.

Language Modeling Language Modelling +3

HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation

no code implementations10 Oct 2024 Shanyan Guan, Yanhao Ge, Ying Tai, Jian Yang, Wei Li, Mingyu You

Recent advancements in text-to-image diffusion models have shown remarkable creative capabilities with textual prompts, but generating personalized instances based on specific subjects, known as subject-driven generation, remains challenging.

Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics

no code implementations10 Oct 2024 Junyi Cao, Shanyan Guan, Yanhao Ge, Wei Li, Xiaokang Yang, Chao Ma

While humans effortlessly discern intrinsic dynamics and adapt to new scenarios, modern AI systems often struggle.

Visual Grounding

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization

no code implementations9 Oct 2024 Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang

To address potential catastrophic forgetting of non-captioning abilities due to mrDPO, we propose rebirth tuning, which finetunes the pre-DPO LLM by using the captions generated by the mrDPO-trained model as supervised labels.

Audio captioning Large Language Model +4

MRF-Net: An Infrared Remote Sensing Image Thin Cloud Removal Method With the Intra-Inter Coherent Constraint

1 code implementation TGRS 2024 Qizhi Xu, Jiuchen Chen, Xinyu Yan, Wei Li

To address this problem, we proposed the multiscale residual fusion network (MRF-Net) to remove thin cloud from infrared remote sensing imagery.

Cloud Removal

Video Instruction Tuning With Synthetic Data

no code implementations3 Oct 2024 Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li

The development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web.

3D Question Answering (3D-QA) Instruction Following +3

A Generalized Tensor Formulation for Hyperspectral Image Super-Resolution Under General Spatial Blurring

no code implementations27 Sep 2024 Yinjian Wang, Wei Li, Yuanyuan Gui, Qian Du, James E. Fowler

Hyperspectral super-resolution is commonly accomplished by the fusing of a hyperspectral imaging of low spatial resolution with a multispectral image of high spatial resolution, and many tensor-based approaches to this task have been recently proposed.

Hyperspectral Image Super-Resolution Image Super-Resolution

Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification

1 code implementation17 Sep 2024 Jiaxing Xu, Kai He, Mengcheng Lan, Qingtian Bian, Wei Li, Tieying Li, Yiping Ke, Miao Qiao

It generates a prior-knowledge-enhanced contrast graph to address the distribution shifts across sub-populations by a two-stream attention mechanism.

BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking

no code implementations22 Aug 2024 Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du

This bias allows the tracker to directly use the visual features obtained from the false-color images generated by hyperspectral images without the need to extract spectral features.

Object Tracking

Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation

1 code implementation20 Aug 2024 Jiawei Han, Kaiqi Liu, Wei Li, Guangzhi Chen

Specifically, the point cloud is initially separated into independent point sets by category to provide initial conditions for the generation of feature subspaces.

Segmentation Semantic Segmentation

Deep Code Search with Naming-Agnostic Contrastive Multi-View Learning

no code implementations18 Aug 2024 Jiadong Feng, Wei Li, Zhao Wei, Yong Xu, Juhong Wang, Hui Li

However, developers may not follow the same naming conventions and the same variable may have different variable names in different implementations, bringing a challenge to deep learning based code search methods that rely on explicit variable correspondences to understand source code.

Code Search Contrastive Learning +3

Beyond Inter-Item Relations: Dynamic Adaption for Enhancing LLM-Based Sequential Recommendation

no code implementations14 Aug 2024 CanYi Liu, Wei Li, Youchen, Zhang, Hui Li, Rongrong Ji

Built on top of coarse-grained adaption for capturing inter-item relations, DARec is further enhanced with (1) context masking that models intra-item relations to help LLM better understand token and item semantics in the context of SRS, (2) collaborative knowledge injection that helps LLM incorporate long-term collaborative knowledge, and (3) a dynamic adaption mechanism that uses Bayesian optimization to flexibly choose layer-wise adapter architectures in order to better incorporate different sequential information.

Bayesian Optimization Sequential Recommendation

DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

1 code implementation9 Aug 2024 Zeyu Yang, Nan Song, Wei Li, Xiatian Zhu, Li Zhang, Philip H. S. Torr

To demonstrate the effectiveness of the proposed strategy, we design DeepInteraction++, a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.

3D Object Detection Autonomous Driving +3

Respiratory Subtraction for Pulmonary Microwave Ablation Evaluation

no code implementations8 Aug 2024 Wan Li, Xinyun Zhong, Wei Li, Song Zhang, Moheng Rong, Yan Xi, Peng Yuan, Zechen Wang, Xiaolei Jiang, Rongxi Yi, Hui Tang, Yang Chen, Chaohui Tong, Zhan Wu, Feng Wang

The experimental results confirm the effectiveness of the respiratory subtraction method and the proposed quantitative evaluation metric in assessing lung tumor treatment.

MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation

no code implementations6 Aug 2024 Xiaofeng Mao, Zhengkai Jiang, Qilin Wang, Chencan Fu, Jiangning Zhang, Jiafu Wu, Yabiao Wang, Chengjie Wang, Wei Li, Mingmin Chi

In an attempt to bridge this research gap, we introduce a novel Masked Diffusion Transformer for co-speech gesture generation, referred to as MDT-A2G, which directly implements the denoising process on gesture sequences.

Denoising Gesture Generation

Dynamic Object Queries for Transformer-based Incremental Object Detection

no code implementations31 Jul 2024 Jichuan Zhang, Wei Li, Shuang Cheng, Ya-Li Li, Shengjin Wang

These new object queries are aggregated with those from previous phases to adapt both old and new knowledge well.

Knowledge Distillation Object +2

The Llama 3 Herd of Models

2 code implementations31 Jul 2024 Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chunyang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, Danny Wyatt, David Esiobu, Dhruv Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michael Smith, Filip Radenovic, Francisco Guzmán, Frank Zhang, Gabriel Synnaeve, Gabrielle Lee, Georgia Lewis Anderson, Govind Thattai, Graeme Nail, Gregoire Mialon, Guan Pang, Guillem Cucurell, Hailey Nguyen, Hannah Korevaar, Hu Xu, Hugo Touvron, Iliyan Zarov, Imanol Arrieta Ibarra, Isabel Kloumann, Ishan Misra, Ivan Evtimov, Jack Zhang, Jade Copet, Jaewon Lee, Jan Geffert, Jana Vranes, Jason Park, Jay Mahadeokar, Jeet Shah, Jelmer Van der Linde, Jennifer Billock, Jenny Hong, Jenya Lee, Jeremy Fu, Jianfeng Chi, Jianyu Huang, Jiawen Liu, Jie Wang, Jiecao Yu, Joanna Bitton, Joe Spisak, Jongsoo Park, Joseph Rocca, Joshua Johnstun, Joshua Saxe, Junteng Jia, Kalyan Vasuden Alwala, Karthik Prasad, Kartikeya Upasani, Kate Plawiak, Ke Li, Kenneth Heafield, Kevin Stone, Khalid El-Arini, Krithika Iyer, Kshitiz Malik, Kuenley Chiu, Kunal Bhalla, Kushal Lakhotia, Lauren Rantala-Yeary, Laurens van der Maaten, Lawrence Chen, Liang Tan, Liz Jenkins, Louis Martin, Lovish Madaan, Lubo Malo, Lukas Blecher, Lukas Landzaat, Luke de Oliveira, Madeline Muzzi, Mahesh Pasupuleti, Mannat Singh, Manohar Paluri, Marcin Kardas, Maria Tsimpoukelli, Mathew Oldham, Mathieu Rita, Maya Pavlova, Melanie Kambadur, Mike Lewis, Min Si, Mitesh Kumar Singh, Mona Hassan, Naman Goyal, Narjes Torabi, Nikolay Bashlykov, Nikolay Bogoychev, Niladri Chatterji, Ning Zhang, Olivier Duchenne, Onur Çelebi, Patrick Alrassy, Pengchuan Zhang, Pengwei Li, Petar Vasic, Peter Weng, Prajjwal Bhargava, Pratik Dubal, Praveen Krishnan, Punit Singh Koura, Puxin Xu, Qing He, Qingxiao Dong, Ragavan Srinivasan, Raj Ganapathy, Ramon Calderer, Ricardo Silveira Cabral, Robert Stojnic, Roberta Raileanu, Rohan Maheswari, Rohit Girdhar, Rohit Patel, Romain Sauvestre, Ronnie Polidoro, Roshan Sumbaly, Ross Taylor, Ruan Silva, Rui Hou, Rui Wang, Saghar Hosseini, Sahana Chennabasappa, Sanjay Singh, Sean Bell, Seohyun Sonia Kim, Sergey Edunov, Shaoliang Nie, Sharan Narang, Sharath Raparthy, Sheng Shen, Shengye Wan, Shruti Bhosale, Shun Zhang, Simon Vandenhende, Soumya Batra, Spencer Whitman, Sten Sootla, Stephane Collot, Suchin Gururangan, Sydney Borodinsky, Tamar Herman, Tara Fowler, Tarek Sheasha, Thomas Georgiou, Thomas Scialom, Tobias Speckbacher, Todor Mihaylov, Tong Xiao, Ujjwal Karn, Vedanuj Goswami, Vibhor Gupta, Vignesh Ramanathan, Viktor Kerkez, Vincent Gonguet, Virginie Do, Vish Vogeti, Vítor Albiero, Vladan Petrovic, Weiwei Chu, Wenhan Xiong, Wenyin Fu, Whitney Meers, Xavier Martinet, Xiaodong Wang, Xiaofang Wang, Xiaoqing Ellen Tan, Xide Xia, Xinfeng Xie, Xuchao Jia, Xuewei Wang, Yaelle Goldschlag, Yashesh Gaur, Yasmine Babaei, Yi Wen, Yiwen Song, Yuchen Zhang, Yue Li, Yuning Mao, Zacharie Delpierre Coudert, Zheng Yan, Zhengxing Chen, Zoe Papakipos, Aaditya Singh, Aayushi Srivastava, Abha Jain, Adam Kelsey, Adam Shajnfeld, Adithya Gangidi, Adolfo Victoria, Ahuva Goldstand, Ajay Menon, Ajay Sharma, Alex Boesenberg, Alexei Baevski, Allie Feinstein, Amanda Kallet, Amit Sangani, Amos Teo, Anam Yunus, Andrei Lupu, Andres Alvarado, Andrew Caples, Andrew Gu, Andrew Ho, Andrew Poulton, Andrew Ryan, Ankit Ramchandani, Annie Dong, Annie Franco, Anuj Goyal, Aparajita Saraf, Arkabandhu Chowdhury, Ashley Gabriel, Ashwin Bharambe, Assaf Eisenman, Azadeh Yazdan, Beau James, Ben Maurer, Benjamin Leonhardi, Bernie Huang, Beth Loyd, Beto De Paola, Bhargavi Paranjape, Bing Liu, Bo Wu, Boyu Ni, Braden Hancock, Bram Wasti, Brandon Spence, Brani Stojkovic, Brian Gamido, Britt Montalvo, Carl Parker, Carly Burton, Catalina Mejia, Ce Liu, Changhan Wang, Changkyu Kim, Chao Zhou, Chester Hu, Ching-Hsiang Chu, Chris Cai, Chris Tindal, Christoph Feichtenhofer, Cynthia Gao, Damon Civin, Dana Beaty, Daniel Kreymer, Daniel Li, David Adkins, David Xu, Davide Testuggine, Delia David, Devi Parikh, Diana Liskovich, Didem Foss, Dingkang Wang, Duc Le, Dustin Holland, Edward Dowling, Eissa Jamil, Elaine Montgomery, Eleonora Presani, Emily Hahn, Emily Wood, Eric-Tuan Le, Erik Brinkman, Esteban Arcaute, Evan Dunbar, Evan Smothers, Fei Sun, Felix Kreuk, Feng Tian, Filippos Kokkinos, Firat Ozgenel, Francesco Caggioni, Frank Kanayet, Frank Seide, Gabriela Medina Florez, Gabriella Schwarz, Gada Badeer, Georgia Swee, Gil Halpern, Grant Herman, Grigory Sizov, Guangyi, Zhang, Guna Lakshminarayanan, Hakan Inan, Hamid Shojanazeri, Han Zou, Hannah Wang, Hanwen Zha, Haroun Habeeb, Harrison Rudolph, Helen Suk, Henry Aspegren, Hunter Goldman, Hongyuan Zhan, Ibrahim Damlaj, Igor Molybog, Igor Tufanov, Ilias Leontiadis, Irina-Elena Veliche, Itai Gat, Jake Weissman, James Geboski, James Kohli, Janice Lam, Japhet Asher, Jean-Baptiste Gaya, Jeff Marcus, Jeff Tang, Jennifer Chan, Jenny Zhen, Jeremy Reizenstein, Jeremy Teboul, Jessica Zhong, Jian Jin, Jingyi Yang, Joe Cummings, Jon Carvill, Jon Shepard, Jonathan McPhie, Jonathan Torres, Josh Ginsburg, Junjie Wang, Kai Wu, Kam Hou U, Karan Saxena, Kartikay Khandelwal, Katayoun Zand, Kathy Matosich, Kaushik Veeraraghavan, Kelly Michelena, Keqian Li, Kiran Jagadeesh, Kun Huang, Kunal Chawla, Kyle Huang, Lailin Chen, Lakshya Garg, Lavender A, Leandro Silva, Lee Bell, Lei Zhang, Liangpeng Guo, Licheng Yu, Liron Moshkovich, Luca Wehrstedt, Madian Khabsa, Manav Avalani, Manish Bhatt, Martynas Mankus, Matan Hasson, Matthew Lennie, Matthias Reso, Maxim Groshev, Maxim Naumov, Maya Lathi, Meghan Keneally, Miao Liu, Michael L. Seltzer, Michal Valko, Michelle Restrepo, Mihir Patel, Mik Vyatskov, Mikayel Samvelyan, Mike Clark, Mike Macey, Mike Wang, Miquel Jubert Hermoso, Mo Metanat, Mohammad Rastegari, Munish Bansal, Nandhini Santhanam, Natascha Parks, Natasha White, Navyata Bawa, Nayan Singhal, Nick Egebo, Nicolas Usunier, Nikhil Mehta, Nikolay Pavlovich Laptev, Ning Dong, Norman Cheng, Oleg Chernoguz, Olivia Hart, Omkar Salpekar, Ozlem Kalinli, Parkin Kent, Parth Parekh, Paul Saab, Pavan Balaji, Pedro Rittner, Philip Bontrager, Pierre Roux, Piotr Dollar, Polina Zvyagina, Prashant Ratanchandani, Pritish Yuvraj, Qian Liang, Rachad Alao, Rachel Rodriguez, Rafi Ayub, Raghotham Murthy, Raghu Nayani, Rahul Mitra, Rangaprabhu Parthasarathy, Raymond Li, Rebekkah Hogan, Robin Battey, Rocky Wang, Russ Howes, Ruty Rinott, Sachin Mehta, Sachin Siby, Sai Jayesh Bondu, Samyak Datta, Sara Chugh, Sara Hunt, Sargun Dhillon, Sasha Sidorov, Satadru Pan, Saurabh Mahajan, Saurabh Verma, Seiji Yamamoto, Sharadh Ramaswamy, Shaun Lindsay, Sheng Feng, Shenghao Lin, Shengxin Cindy Zha, Shishir Patil, Shiva Shankar, Shuqiang Zhang, Sinong Wang, Sneha Agarwal, Soji Sajuyigbe, Soumith Chintala, Stephanie Max, Stephen Chen, Steve Kehoe, Steve Satterfield, Sudarshan Govindaprasad, Sumit Gupta, Summer Deng, Sungmin Cho, Sunny Virk, Suraj Subramanian, Sy Choudhury, Sydney Goldman, Tal Remez, Tamar Glaser, Tamara Best, Thilo Koehler, Thomas Robinson, Tianhe Li, Tianjun Zhang, Tim Matthews, Timothy Chou, Tzook Shaked, Varun Vontimitta, Victoria Ajayi, Victoria Montanez, Vijai Mohan, Vinay Satish Kumar, Vishal Mangla, Vlad Ionescu, Vlad Poenaru, Vlad Tiberiu Mihailescu, Vladimir Ivanov, Wei Li, Wenchen Wang, WenWen Jiang, Wes Bouaziz, Will Constable, Xiaocheng Tang, Xiaojian Wu, Xiaolan Wang, Xilun Wu, Xinbo Gao, Yaniv Kleinman, Yanjun Chen, Ye Hu, Ye Jia, Ye Qi, Yenda Li, Yilin Zhang, Ying Zhang, Yossi Adi, Youngjin Nam, Yu, Wang, Yu Zhao, Yuchen Hao, Yundi Qian, Yunlu Li, Yuzi He, Zach Rait, Zachary DeVito, Zef Rosnbrick, Zhaoduo Wen, Zhenyu Yang, Zhiwei Zhao, Zhiyu Ma

This paper presents a new set of foundation models, called Llama 3.

answerability prediction Language Modeling +5

Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective

no code implementations11 Jul 2024 Runyuan Ma, Wei Li, FuKai Shang

Construction techniques and methods for public fine-tuning datasets of Large Language Models (LLMs), including data generation and data augmentation among others, are detailed.

Data Augmentation

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

2 code implementations10 Jul 2024 Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li

To this end, we introduce LLaVA-NeXT-Interleave, which simultaneously tackles Multi-image, Multi-frame (video), Multi-view (3D), and Multi-patch (single-image) scenarios in LMMs.

Zero-Shot Video Question Answer

Music Era Recognition Using Supervised Contrastive Learning and Artist Information

no code implementations7 Jul 2024 Qiqi He, Xuchen Song, Weituo Hao, Ju-Chiang Wang, Wei-Tsung Lu, Wei Li

For the case where the artist information is available, we extend the audio-based model to take multimodal inputs and develop a framework, called MultiModal Contrastive (MMC) learning, to enhance the training.

Contrastive Learning Music Classification

WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation

1 code implementation2 Jul 2024 Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods.

Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting

1 code implementation28 Jun 2024 Wei Li, Jingyang Zhang, Pheng-Ann Heng, Lixu Gu

Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources.

Denoising Incremental Learning +2

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

1 code implementation24 Jun 2024 Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu

The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location (IMDL).

Image Manipulation Image Manipulation Detection

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

1 code implementation22 Jun 2024 Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

To obtain fine-grained temporal information required by speech understanding, while keeping efficient for other video elements, this paper proposes a novel multi-resolution causal Q-Former (MRC Q-Former) structure to connect pre-trained audio-visual encoders and the backbone large language model.

Diversity Language Modeling +3

Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR

no code implementations12 Jun 2024 Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen, Tze Yuang Chong, Wei Li, Jun Zhang, Lu Lu, Yuxuan Wang

This paper addresses challenges in integrating new languages into a pre-trained multilingual automatic speech recognition (mASR) system, particularly in scenarios where training data for existing languages is limited or unavailable.

Automatic Speech Recognition Decoder +2

HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation

no code implementations11 Jun 2024 Wen Luo, Tianshu Shen, Wei Li, Guangyue Peng, Richeng Xuan, Houfeng Wang, Xi Yang

Leveraging HalluDial, we conduct a comprehensive meta-evaluation of LLMs' hallucination evaluation capabilities in information-seeking dialogues and introduce a specialized judge language model, HalluJudge.

Hallucination Hallucination Evaluation +2

F-LMM: Grounding Frozen Large Multimodal Models

2 code implementations9 Jun 2024 Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy

To address this issue, we present F-LMM -- grounding frozen off-the-shelf LMMs in human-AI conversations -- a straightforward yet effective design based on the fact that word-pixel correspondences conducive to visual grounding inherently exist in the attention weights of well-trained LMMs.

General Knowledge Instruction Following +5

PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection

1 code implementation9 Jun 2024 Wei Li, Pin-Yu Chen, Sijia Liu, Ren Wang

PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels with dropout applied during inference, while backdoor samples exhibit less PS.

Prediction

On the Effects of Data Scale on UI Control Agents

no code implementations6 Jun 2024 Wei Li, William Bishop, Alice Li, Chris Rawles, Folawiyo Campbell-Ajala, Divya Tyamagundlu, Oriana Riva

Moreover, AndroidControl is the most diverse computer control dataset to date, including 14, 548 unique tasks over 833 Android apps, thus allowing us to conduct in-depth analysis of the model performance in and out of the domain of the training data.

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets

no code implementations4 Jun 2024 Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin

The dispersion of data sources and diversity of data formats often lead to inefficiencies in data retrieval and processing, significantly impeding the progress of AI research and applications.

Diversity

Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic Segmentation

no code implementations29 May 2024 Lianlei Shan, Wenzhang Zhou, Wei Li, Xingyu Ding

During incrementally learning novel classes, the data distribution of old classes will be destroyed, leading to catastrophic forgetting.

Few-Shot Semantic Segmentation Semantic Segmentation

Detection-Correction Structure via General Language Model for Grammatical Error Correction

1 code implementation28 May 2024 Wei Li, Houfeng Wang

Grammatical error correction (GEC) is a task dedicated to rectifying texts with minimal edits, which can be decoupled into two components: detection and correction.

Grammatical Error Correction Language Modeling +2

Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

no code implementations25 May 2024 Junlin Song, Yuzhuo Chen, Yuan YAO, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

To address this gap, we develop a computer-aided diagnostic model focusing on white matter regions in brain MRI by employing radiomics and machine learning methods.

Diagnostic

Collaboration of Teachers for Semi-supervised Object Detection

no code implementations22 May 2024 Liyu Chen, Huaao Tang, Yi Wen, Hanting Chen, Wei Li, Junchao Liu, Jie Hu

To address these issues, we propose the Collaboration of Teachers Framework (CTF), which consists of multiple pairs of teacher and student models for training.

Object object-detection +2

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

no code implementations21 May 2024 Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong liu

We observe that both face reenactment/swapping tasks essentially involve combinations of target structure, ID and attribute.

Attribute Decoder +1

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

1 code implementation20 May 2024 Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes.

NeRF Novel View Synthesis

Picking watermarks from noise (PWFN): an improved robust watermarking model against intensive distortions

no code implementations8 May 2024 Sijing Xie, Chengxin Zhao, Nan Sun, Wei Li, Hefei Ling

To improve the robustness of the decoder against stronger noise, this paper proposes to introduce a denoise module between the noise layer and the decoder.

Decoder

ESP: Extro-Spective Prediction for Long-term Behavior Reasoning in Emergency Scenarios

no code implementations7 May 2024 Dingrui Wang, Zheyuan Lai, Yuda Li, Yi Wu, Yuexin Ma, Johannes Betz, Ruigang Yang, Wei Li

Furthermore, a new metric named clamped temporal error (CTE) is proposed to give a more comprehensive evaluation of prediction performance, especially in time-sensitive emergency events of subseconds.

Autonomous Driving Prediction

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

no code implementations29 Apr 2024 Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, Jinyang Len, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He

In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture.

Common Sense Reasoning Multiple-choice

BezierFormer: A Unified Architecture for 2D and 3D Lane Detection

no code implementations25 Apr 2024 Zhiwei Dong, Xi Zhu, Xiya Cao, Ran Ding, Wei Li, Caifa Zhou, Yongliang Wang, Qiangbo Liu

B\'{e}zierFormer formulate queries as B\'{e}zier control points and incorporate a novel B\'{e}zier curve attention mechanism.

3D Lane Detection

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

3 code implementations16 Apr 2024 Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi

In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.

Image Super-Resolution

LIPT: Latency-aware Image Processing Transformer

1 code implementation9 Apr 2024 Junbo Qiao, Wei Li, Haizhen Xie, Hanting Chen, Yunshuai Zhou, Zhijun Tu, Jie Hu, Shaohui Lin

Extensive experiments on multiple image processing tasks (e. g., image super-resolution (SR), JPEG artifact reduction, and image denoising) demonstrate the superiority of LIPT on both latency and PSNR.

Image Denoising Image Super-Resolution

Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution

no code implementations3 Apr 2024 Simiao Li, Yun Zhang, Wei Li, Hanting Chen, Wenjia Wang, BingYi Jing, Shaohui Lin, Jie Hu

Knowledge distillation (KD) is a promising yet challenging model compression technique that transfers rich learning representations from a well-performing but cumbersome teacher model to a compact student model.

Image Super-Resolution Knowledge Distillation +1

Make Continual Learning Stronger via C-Flat

1 code implementation1 Apr 2024 Ang Bian, Wei Li, Hangjie Yuan, Chengrong Yu, Mang Wang, Zixiang Zhao, Aojun Lu, Pengliang Ji, Tao Feng

A general framework of C-Flat applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented in this paper, showing that our method can boost CL performance in almost all cases.

Continual Learning

IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions

no code implementations31 Mar 2024 Zhijun Tu, Kunpeng Du, Hanting Chen, Hailing Wang, Wei Li, Jie Hu, Yunhe Wang

Recent advances have demonstrated the powerful capability of transformer architecture in image restoration.

Deblurring Denoising +3

InternLM2 Technical Report

3 code implementations26 Mar 2024 Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).

4k Long-Context Understanding

CodeS: Natural Language to Code Repository via Multi-Layer Sketch

2 code implementations25 Mar 2024 Daoguang Zan, Ailun Yu, Wei Liu, Dong Chen, Bo Shen, Wei Li, Yafen Yao, Yongshun Gong, Xiaolin Chen, Bei guan, Zhiguang Yang, Yongji Wang, Qianxiang Wang, Lizhen Cui

For feedback-based evaluation, we develop a VSCode plugin for CodeS and engage 30 participants in conducting empirical studies.

Benchmarking

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

1 code implementation21 Mar 2024 Jiaxing Sun, Weiquan Huang, Jiang Wu, Chenya Gu, Wei Li, Songyang Zhang, Hang Yan, Conghui He

We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense.

Benchmarking Memorization

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

2 code implementations20 Mar 2024 Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.

Contrastive Learning Fine-Grained Visual Recognition +3

Parameter Efficient Reinforcement Learning from Human Feedback

no code implementations15 Mar 2024 Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Simral Chaudhary, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu, Bowen Li, Saravanan Ganesh, Bill Byrne, Jessica Hoffmann, Hassan Mansoor, Wei Li, Abhinav Rastogi, Lucas Dixon

In this work, we empirically evaluate the setup of Parameter Efficient Reinforcement Learning from Human Feedback (PE-RLHF) that leverages LoRA fine-tuning for Reward Modeling, and Reinforcement Learning.

Question Answering reinforcement-learning +3

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

1 code implementation12 Mar 2024 Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

After instruction tuning, KnowCoder further exhibits strong generalization ability on unseen schemas and achieves up to $\textbf{12. 5%}$ and $\textbf{21. 9%}$, compared to sota baselines, under the zero-shot setting and the low resource setting, respectively.

Code Generation Language Modelling +2

SSF-Net: Spatial-Spectral Fusion Network with Spectral Angle Awareness for Hyperspectral Object Tracking

no code implementations9 Mar 2024 Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du, Jing Tian

Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking.

Object Object Tracking

Unlocking the Power of Large Language Models for Entity Alignment

1 code implementation23 Feb 2024 Xuhui Jiang, Yinghan Shen, Zhichao Shi, Chengjin Xu, Wei Li, Zixuan Li, Jian Guo, HuaWei Shen, Yuanzhuo Wang

To address the constraints of limited input KG data, ChatEA introduces a KG-code translation module that translates KG structures into a format understandable by LLMs, thereby allowing LLMs to utilize their extensive background knowledge to improve EA accuracy.

Code Translation Entity Alignment +2

An Error-Matching Exclusion Method for Accelerating Visual SLAM

no code implementations22 Feb 2024 Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

In Visual SLAM, achieving accurate feature matching consumes a significant amount of time, severely impacting the real-time performance of the system.

A Feature Matching Method Based on Multi-Level Refinement Strategy

no code implementations21 Feb 2024 Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

Feature matching is a fundamental and crucial process in visual SLAM, and precision has always been a challenging issue in feature matching.

A Robust Error-Resistant View Selection Method for 3D Reconstruction

no code implementations18 Feb 2024 Shaojie Zhang, Yinghui Wang, Bin Nan, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

To address the issue of increased triangulation uncertainty caused by selecting views with small camera baselines in Structure from Motion (SFM) view selection, this paper proposes a robust error-resistant view selection method.

3D Reconstruction

Region Feature Descriptor Adapted to High Affine Transformations

no code implementations15 Feb 2024 Shaojie Zhang, Yinghui Wang, Bin Nan, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

To address the issue of feature descriptors being ineffective in representing grayscale feature information when images undergo high affine transformations, leading to a rapid decline in feature matching accuracy, this paper proposes a region feature descriptor based on simulating affine transformations using classification.

A Highlight Removal Method for Capsule Endoscopy Images

no code implementations11 Feb 2024 Shaojie Zhang, Yinghui Wang, Peixuan Liu, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov

The images captured by Wireless Capsule Endoscopy (WCE) always exhibit specular reflections, and removing highlights while preserving the color and texture in the region remains a challenge.

highlight removal

UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion

no code implementations24 Jan 2024 Wei Li, Xue Xu, Jiachen Liu, Xinyan Xiao

This paper presents UNIMO-G, a simple multimodal conditional diffusion framework that operates on multimodal prompts with interleaved textual and visual inputs, which demonstrates a unified ability for both text-driven and subject-driven image generation.

Conditional Image Generation Denoising +6

OMG-Seg: Is One Model Good Enough For All Segmentation?

1 code implementation CVPR 2024 Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy

In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.

All Decoder +5

Evolutionary Alternating Direction Method of Multipliers for Constrained Multi-Objective Optimization with Unknown Constraints

no code implementations2 Jan 2024 Shuang Li, Ke Li, Wei Li, Ming Yang

Constrained multi-objective optimization problems (CMOPs) pervade real-world applications in science, engineering, and design.

A Generalist FaceX via Learning Unified Facial Representation

1 code implementation31 Dec 2023 Yue Han, Jiangning Zhang, Junwei Zhu, Xiangtai Li, Yanhao Ge, Wei Li, Chengjie Wang, Yong liu, Xiaoming Liu, Ying Tai

This work presents FaceX framework, a novel facial generalist model capable of handling diverse facial tasks simultaneously.

Facial Editing

DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection

1 code implementation25 Dec 2023 Li Xiang, Junbo Yin, Wei Li, Cheng-Zhong Xu, Ruigang Yang, Jianbing Shen

Specifically, DMA builds a domain-mixing 3D instance bank for the teacher and student models during training, resulting in aligned data representation.

3D Object Detection object-detection +1

Domain Similarity-Perceived Label Assignment for Domain Generalized Underwater Object Detection

no code implementations20 Dec 2023 Xisheng Li, Wei Li, Pinhao Song, Mingjun Zhang, Jie zhou

The inherent characteristics and light fluctuations of water bodies give rise to the huge difference between different layers and regions in underwater environments.

Data Augmentation object-detection +1

UINav: A Practical Approach to Train On-Device Automation Agents

no code implementations15 Dec 2023 Wei Li, Fu-Lin Hsu, Will Bishop, Folawiyo Campbell-Ajala, Max Lin, Oriana Riva

Automation systems that can autonomously drive application user interfaces to complete user tasks are of great benefit, especially when users are situationally or permanently impaired.

Diversity

CGS-Mask: Making Time Series Predictions Intuitive for All

no code implementations15 Dec 2023 Feng Lu, Wei Li, Yifei Sun, Cheng Song, Yufei Ren, Albert Y. Zomaya

Artificial intelligence (AI) has immense potential in time series prediction, but most explainable tools have limited capabilities in providing a systematic understanding of important features over time.

All Decision Making +3

CBQ: Cross-Block Quantization for Large Language Models

no code implementations13 Dec 2023 Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, Yunhe Wang

Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.

Quantization

Brain Computer Interface Technology for Future Battlefield

no code implementations13 Dec 2023 Guodong Xiong, Xinyan Ma, Wei Li, Jiaqi Cao, Jian Zhong, Yicong Su

With the development of artificial intelligence and unmanned equipment, human-machine hybrid formations will be the main focus in future combat formations.

Brain Computer Interface Decision Making

GenDet: Towards Good Generalizations for AI-Generated Image Detection

1 code implementation12 Dec 2023 Mingjian Zhu, Hanting Chen, Mouxiao Huang, Wei Li, Hailin Hu, Jie Hu, Yunhe Wang

The misuse of AI imagery can have harmful societal effects, prompting the creation of detectors to combat issues like the spread of fake news.

Anomaly Detection

Knowledge Graph Driven Recommendation System Algorithm

no code implementations1 Dec 2023 Chaoyang Zhang, Yanan Li, Shen Chen, Siwei Fan, Wei Li

We first use a single-layer neural network to merge individual node features in the graph, and then adjust the aggregation weights of neighboring entities by incorporating influence factors.

Graph Neural Network

Identifying percolation phase transitions with unsupervised learning based on largest clusters

no code implementations20 Nov 2023 Dian Xu, Shanshan Wang, Weibing Deng, Feng Gao, Wei Li, Jianmin Shen

This paper suggests that, by inputting the largest cluster rather than the original configuration into the learning model, unsupervised learning can indeed predict the critical point of the percolation model.

FireMatch: A Semi-Supervised Video Fire Detection Network Based on Consistency and Distribution Alignment

no code implementations9 Nov 2023 Qinghua Lin, Zuoyong Li, Kun Zeng, Haoyi Fan, Wei Li, Xiaoguang Zhou

Considering the limited quantity of labeled video data, we propose a semi-supervised fire detection model called FireMatch, which is based on consistency regularization and adversarial distribution alignment.

Data Augmentation Fairness +2

An invariant feature extraction for multi-modal images matching

no code implementations6 Nov 2023 Chenzhong Gao, Wei Li

This paper aims at providing an effective multi-modal images invariant feature extraction and matching algorithm for the application of multi-source data analysis.

Video-Helpful Multimodal Machine Translation

1 code implementation31 Oct 2023 Yihang Li, Shuichiro Shimizu, Chenhui Chu, Sadao Kurohashi, Wei Li

In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation.

Multimodal Machine Translation Translation

SALMONN: Towards Generic Hearing Abilities for Large Language Models

1 code implementation20 Oct 2023 Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to the perception and understanding of general auditory information consisting of at least three types of sounds: speech, audio events, and music.

Audio captioning Automatic Speech Recognition +10

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

1 code implementation15 Oct 2023 Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li

Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks.

Instrument Playing Technique Detection Onset Detection +1

On the Convergence of Federated Averaging under Partial Participation for Over-parameterized Neural Networks

no code implementations9 Oct 2023 Xin Liu, Wei Li, Dazhi Zhan, Yu Pan, Xin Ma, Yu Ding, Zhisong Pan

Federated learning (FL) is a widely employed distributed paradigm for collaboratively training machine learning models from multiple clients without sharing local data.

Federated Learning

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

2 code implementations9 Oct 2023 Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs.

Question Answering Video Question Answering

Cross-head mutual Mean-Teaching for semi-supervised medical image segmentation

1 code implementation8 Oct 2023 Wei Li, Ruifeng Bian, Wenyi Zhao, Weijin Xu, Huihua Yang

To address these concerns, we propose a novel Cross-head mutual mean-teaching Network (CMMT-Net) incorporated strong-weak data augmentation, thereby benefitting both self-training and consistency learning.

Data Augmentation Image Segmentation +2

A Holistic Evaluation of Piano Sound Quality

no code implementations7 Oct 2023 Monan Zhou, Shangda Wu, Shaohua Ji, Zijin Li, Wei Li

Unlike previous studies that focused on the effect of piano performance techniques on sound quality, this study evaluates the inherent sound quality of different pianos.

Diversity Few-Shot Learning

Model2Scene: Learning 3D Scene Representation via Contrastive Language-CAD Models Pre-training

no code implementations29 Sep 2023 Runnan Chen, Xinge Zhu, Nenglun Chen, Dawei Wang, Wei Li, Yuexin Ma, Ruigang Yang, Tongliang Liu, Wenping Wang

In this paper, we propose Model2Scene, a novel paradigm that learns free 3D scene representation from Computer-Aided Design (CAD) models and languages.

3D Semantic Segmentation Object

IFT: Image Fusion Transformer for Ghost-free High Dynamic Range Imaging

no code implementations26 Sep 2023 Hailing Wang, Wei Li, Yuanyuan Xi, Jie Hu, Hanting Chen, Longyu Li, Yunhe Wang

By matching similar patches between frames, objects with large motion ranges in dynamic scenes can be aligned, which can effectively alleviate the generation of artifacts.

Connecting Speech Encoder and Large Language Model for ASR

no code implementations25 Sep 2023 Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Q-Former-based LLMs can generalise well to out-of-domain datasets, where 12% relative WER reductions over the Whisper baseline ASR model were achieved on the Eval2000 test set without using any in-domain training data from Switchboard.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Data Upcycling Knowledge Distillation for Image Super-Resolution

1 code implementation25 Sep 2023 Yun Zhang, Wei Li, Simiao Li, Hanting Chen, Zhijun Tu, Wenjia Wang, BingYi Jing, Shaohui Lin, Jie Hu

Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to compact student models.

Image Super-Resolution Knowledge Distillation +1

EMelodyGen: Emotion-Conditioned Melody Generation in ABC Notation with the Musical Feature Template

1 code implementation23 Sep 2023 Monan Zhou, Xiaobing Li, Feng Yu, Wei Li

The EMelodyGen system focuses on emotional melody generation in ABC notation controlled by the musical feature template.

Data Augmentation Emotion Classification +4

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

1 code implementation22 Sep 2023 Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy

We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.

Data Augmentation Instance Segmentation +1

MiChao-HuaFen 1.0: A Specialized Pre-trained Corpus Dataset for Domain-specific Large Models

no code implementations21 Sep 2023 Yidong Liu, FuKai Shang, Fang Wang, Rui Xu, Jun Wang, Wei Li, Yao Li, Conghui He

With the advancement of deep learning technologies, general-purpose large models such as GPT-4 have demonstrated exceptional capabilities across various domains.

Deep Learning

SoccerNet 2023 Challenges Results

2 code implementations12 Sep 2023 Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng

More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.

Action Spotting Camera Calibration +4

VIGC: Visual Instruction Generation and Correction

2 code implementations24 Aug 2023 Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He

A practical solution to this problem would be to utilize the available multimodal large language models (MLLMs) to generate instruction data for vision-language tasks.

Hallucination Image Captioning +1

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

1 code implementation21 Aug 2023 Conghui He, Zhenjiang Jin, Chao Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Dahua Lin

The rise in popularity of ChatGPT and GPT-4 has significantly accelerated the development of large models, leading to the creation of numerous impressive large language models(LLMs) and multimodal large language models (MLLMs).

Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment

no code implementations16 Aug 2023 Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, Wei Li

Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.

Anomaly Segmentation Autonomous Driving +1

Purely Speckled Intensity Images Need for SAR Despeckling with SDS-SAR

no code implementations11 Aug 2023 Liang Chen, Yifei Yin, Hao Shi, Jingfei He, Wei Li

To address these challenges, we propose a Self-supervised Despeckling Strategy for SAR images (SDS-SAR) that relies solely on speckled intensity data for training.

Sar Image Despeckling

Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation

no code implementations ICCV 2023 Siao Liu, Zhaoyu Chen, Yang Liu, Yuzheng Wang, Dingkang Yang, Zhile Zhao, Ziqing Zhou, Xie Yi, Wei Li, Wenqiang Zhang, Zhongxue Gan

In particular, CG2A develops a Gradient Agreement Solver to adaptively balance the varying gradient magnitudes, and introduces a Soft Gradient Surgery strategy to alleviate the gradient conflicts.

reinforcement-learning

Adaptive Graph Convolution Networks for Traffic Flow Forecasting

1 code implementation7 Jul 2023 Zhengdao Li, Wei Li, Kai Hwang

The AGC-net is constructed by the Adaptive Graph Convolution (AGC) based on a novel context attention mechanism, which consists of a set of graph wavelets with various learnable scales.

NeMO: Neural Map Growing System for Spatiotemporal Fusion in Bird's-Eye-View and BDD-Map Benchmark

no code implementations7 Jun 2023 Xi Zhu, Xiya Cao, Zhiwei Dong, Caifa Zhou, Qiangbo Liu, Wei Li, Yongliang Wang

We also provide a new scene-level BEV map evaluation setting along with the corresponding baseline for a more comprehensive comparison.

Autonomous Driving Time Series

Balancing Logit Variation for Long-tailed Semantic Segmentation

1 code implementation CVPR 2023 Yuchao Wang, Jingjing Fei, Haochen Wang, Wei Li, Tianpeng Bao, Liwei Wu, Rui Zhao, Yujun Shen

In this way, we manage to close the gap between the feature areas of different categories, resulting in a more balanced representation.

Semantic Segmentation

Contextual Object Detection with Multimodal Large Language Models

1 code implementation29 May 2023 Yuhang Zang, Wei Li, Jun Han, Kaiyang Zhou, Chen Change Loy

Moreover, we present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts, so as to locate, identify, and associate visual objects with language inputs for human-AI interaction.

Cloze Test Decoder +7

On the Value of Myopic Behavior in Policy Reuse

no code implementations28 May 2023 Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li

Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.

Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation

1 code implementation23 May 2023 Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang

To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features.

Domain Generalization Semantic Segmentation

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

no code implementations19 May 2023 Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma

Specifically, we first pre-train the model using a reconstruction loss function, by masking phones and their durations jointly on a large amount of unlabeled speech and text prompts.

Self-Supervised Learning

PaLM 2 Technical Report

1 code implementation17 May 2023 Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Code Generation Common Sense Reasoning +6

High-Resolution Remote Sensing Bitemporal Image Change Detection Based on Feature Interaction and Multitask Learning

1 code implementation 2023 2023 Chunhui Zhao, Yingjie Tang, Shou Feng, Yuanze Fan, Wei Li, Ran Tao, and Lifu Zhang

With the development of remote sensing technology, high-resolution (HR) remote sensing optical images have gradually become the main source of change detection data.

Change Detection Domain Adaptation

Mlinear: Rethink the Linear Model for Time-series Forecasting

no code implementations8 May 2023 Wei Li, Xiangxu Meng, Chuhao Chen, Jianing Chen

In this paper, we carefully examine the opposing properties of CI and CD, and raise a practical question that has not been effectively answered, e. g.,"How to effectively mix the CI and CD properties of time series to achieve better predictive performance?"

Philosophy Time Series +1

TransHP: Image Classification with Hierarchical Prompting

1 code implementation NeurIPS 2023 Wenhao Wang, Yifan Sun, Wei Li, Yi Yang

This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.

Classification Image Classification

Adaptive Mask Sampling and Manifold to Euclidean Subspace Learning with Distance Covariance Representation for Hyperspectral Image Classification

1 code implementation IEEE Transactions on Geoscience and Remote Sensing 2023 Mingsong Li, Wei Li, Yikun Liu, Yuwen Huang, and Gongping Yang.

Subsequently, based on distance covariance descriptor, a dual channel distance covariance representation (DC-DCR) module is proposed for modeling unified spectral-spatial feature representations and exploring spectral-spatial relationships, especially linear and nonlinear interdependence in spectral domain.

 Ranked #1 on Hyperspectral Image Classification on Indian Pines (OA@5%perclass metric)

Hyperspectral image analysis Hyperspectral Image Classification +1

Siamese DETR

1 code implementation CVPR 2023 Zeren Chen, Gengshi Huang, Wei Li, Jianing Teng, Kun Wang, Jing Shao, Chen Change Loy, Lu Sheng

In this work, we present Siamese DETR, a Siamese self-supervised pretraining approach for the Transformer architecture in DETR.

MULTI-VIEW LEARNING Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.